Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit


August 2016

OTRS call for help[edit]

Dear colleagues. The volunteer response team (aka OTRS) is currently lacking volunteers to take care of questions regarding the sister projects wikibooks, wikinews, wikiquote and wiktionary. I'd like to invite you to volunteer at meta:OTRS/Volunteering. If you have any questions, please feel free to contact me. Thank you in advance for considering. --Krd (talk) 08:00, 1 August 2016 (UTC)

Why do we have both Category:Mongolian terms derived from Mandarin and Category:Mongolian terms borrowed from Mandarin?[edit]

I just noticed that one of the templates in мөөг creates a (currently redlink) Category:Mongolian terms borrowed from Mandarin but we already have a similar category populated by other templates with a very similar name and semantic: Category:Mongolian terms derived from Mandarin, containing terms like бууз, мантуу, etc. - What to do? — hippietrail (talk) 09:12, 1 August 2016 (UTC)

Borrowing is a subset of deriving, "derived from" is the generic category that holds terms not categorised more specifically. —CodeCat 11:47, 1 August 2016 (UTC)
If Mongolian borrows a term from another language that borrowed it from Chinese that term can't be described as a Mongolian borrowing from Chinese- thus the need for a separate category. Also, the "derived from" node needs to be there needs to be there to keep the data structure parallel with other categories using sister nodes such as "inherited from". Chuck Entz (talk) 12:41, 1 August 2016 (UTC)
I hadn't noticed this level of subdivision before so I'll leave it to you guys, thanks! — hippietrail (talk) 13:53, 1 August 2016 (UTC)

First LexiSession : cat[edit]

Dear all,

Apologies for writing in non-native English; please fix any mistakes you may encounter in these lines!

Wiktionary Tremendous Group, a cabal nice and open gathering of Wiktionarians, is happy to introduce a new collective experiment: LexiSession.

So, what is a LexiSession? The idea is to coordinate a massive number of contributors from different languages to focus on a shared topic, to enhance all projects at the same time! It may remind you of the Commons monthly contests, but here everyone is a winner! For this first LexiSession, we decided on a month - until the end of August - to make friends with a cat! Not only the cat entry, but also Wikisaurus:cat and other pages dealing with the vocabulary one may need to talk about cats: adjectives, verbs and expressions.

You're welcome to contribute alone, or to create a local project and organize an edit-a-thon in your region. We will probably do at least one edit-a-thon in Lyon soon, and another in Paris during the French WikiConference. Please share your contributions here! You can also have a look at what other Wiktionarians are doing, on the LexiSession Meta page. We will discuss the processes and results in Meta, so feel free to have a look and suggest topics for the following LexiSession.

Thank you for your attention, and I hope you will be interested in this new way of contributing. I'll get back to you later for an update! Noé (talk) 21:57, 1 August 2016 (UTC)

Update: in French Wiktionary, we started a thesaurus about cat in French, and that's cool, we have plenty of red links to create! Noé (talk) 09:51, 4 August 2016 (UTC)

Ok guys, last day of this first LexiSession. It's quite a success for French Wiktionary, as we made three thesauri: thesaurus about cat in French, thesaurus about cat in Breton and thesaurus about cat in English. Sadly, it was quite hard to animate our fellow on other wiktionaries. So, we will try to make a better effort for the next ones. Yes, there will be more cross-wiki contribution events! In September, LexiSession 2 is about cartography and street types. Please, let me know if anyone have done anything about cat I haven't been noticed. See you soon. Noé (talk) 10:46, 31 August 2016 (UTC)

I never noticed this. I'll try to add some cat related stuff today while I still can. --WikiTiki89 12:43, 31 August 2016 (UTC)
I see that Wiktionnaire's thesaurus pages are far more comprehensive than the ones here! I have created an Wikisaurus entry for chat (which, to my knowledge, is the first Wikisaurus entry for French). :) Andrew Sheedy (talk) 18:10, 31 August 2016 (UTC)
@Noé (Note aussi que l'on ne peut pas utiliser fellow comme un nom indénombrable. De plus, on n’emploie que rarement ce mot pour désigner un ami ou camarade, ce qui est le sens primaire selon Wiktionnaire. Je dirais plutôt « those on other wiktionaries » au lieu de « our fellow[s] on other wiktionaries ». Autrement, on pourrait l’employer en tant qu’adjectif (ce qui est beaucoup plur courant que le nom) et dire « our fellow editors on other wiktionaries ». Andrew Sheedy (talk) 19:21, 31 August 2016 (UTC)
Au lieu de « Please, let me know if anyone have done anything about cat I haven't been noticed. » → « Please, let me know if anyone has done anything about “cat” that I haven't noticed. » (« been noticed » est dans le voix passif). Andrew Sheedy (talk) 19:28, 31 August 2016 (UTC)
"That" is optional. --WikiTiki89 19:32, 31 August 2016 (UTC)
Hmmm...I'm not sure I agree that it works without it in this case. Strictly speaking, it could be ommitted, but it seems rather awkward to me in this particular sentence. Andrew Sheedy (talk) 20:01, 31 August 2016 (UTC)
Interesting. Thank you a lot for your comments about the language and your creation is awesome, I am very happy that you made something. I like what wikisaurus are in English Wiktionary but I think there is much more vocabulary to gather in a thesaurus. So, it is a different kind of annex in French Wiktionary, with much more terms, sometimes too much, but homemade, crafted by hand. Plus, we added link to those thesaurus into Wikipedia equivalent pages. For example, at the bottom of Chat. We aim to bring people to Wiktionary from Wikipedia, but until now, we can not evaluate if it worked. You can still contribute on cats, it was just a short impulse but we can do it again later, as cat lives 9 times. Noé (talk) 22:36, 31 August 2016 (UTC)


It seems we have categories for English words with consonant pseudo-digraphs and English words with vowel pseudo-digraphs despite the fact that there is no such thing as a "pseudo-digraph". A combination of letters is either a digraph or it isn't. These are cases of letter combinations that look like digraphs, but aren't. Personally, I don't really see the need for these categories, and I don't think we should be using novel orthographic terms or concepts on Wiktionary. What do other folks think? Kaldari (talk) 06:01, 2 August 2016 (UTC)

A digraph is a sound written with two letters, and a pseudo-digraph is a combination that looks like it should be a single sound, but isn't. For instance, zoology doesn't rhyme with eulogy, and a ramshorn is a ram's horn, not a ram shorn. Chuck Entz (talk) 08:05, 2 August 2016 (UTC)
It is sometimes hard to decide what should go in them and they may need to be renamed at least slightly (to use "terms" like evetything else). I'm also sceptical that they're maintainable. - -sche (discuss) 15:24, 2 August 2016 (UTC)
I don't think pseudo-digraph is accurate terminology. Is there any better name we can come up with? --WikiTiki89 15:27, 2 August 2016 (UTC)
I've got into the habit of adding this consonant category to certain entries, but I also dislike the non-standard term pseudo-digraph. I don't see maintainability as a big argument against it, however: I hope that one day we can do it with a bot, based on the spelling and the pronunciation, but that is a seriously hard problem and the solution might be decades away. But we already maintain lots of awkward things manually. Equinox 01:53, 3 August 2016 (UTC)
Re: Kaldari these aren't digraphs but they may appear to be. Hence Pseudo-digraph. Of course there's such a thing. Renard Migrant (talk) 11:22, 31 August 2016 (UTC)

{{lb}} linking to Categories[edit]

Why does {{lb}} work with some categories and not others? For example it won't work with Category:Languages. Is it because no one has thought about it? DonnanZ (talk) 15:34, 2 August 2016 (UTC)

I don't understand what you mean. Could you give some examples of what you are trying to do? --WikiTiki89 17:03, 2 August 2016 (UTC)
For example {{lb|nb|rail}} works fine for Category:nb:Rail transportation, but {{lb|nb|language}} or {{lb|nb|languages}} doesn't work if I add it to, say armensk, which is also an adjective. The category Category:nb:Languages has to be added instead. A little more typing. There are a number of categories like this, but I can't remember which ones now. DonnanZ (talk) 17:55, 2 August 2016 (UTC)
Oh that's what you meant. All the categories, auto-linking, and display text stuff are configurable in Module:labels/data. --WikiTiki89 18:01, 2 August 2016 (UTC)
Hmm, OK, that's all Greek to me. Strangely enough I can't find "language" or "languages" listed as a recognised label, assuming it should be listed alphabetically. Is that the list of labels that have been set up? DonnanZ (talk) 18:38, 2 August 2016 (UTC)
If you can't find it, it's probably not there and that is why it doesn't automatically categorize. --WikiTiki89 18:42, 2 August 2016 (UTC)
Oh great, a serious omission. Can that be rectified please? DonnanZ (talk) 18:58, 2 August 2016 (UTC)
I could do that, but there would be very few cases where it would be needed. In the case of armensk, "language" is not a context label. It's a topic category. A context label would mean that this word is only used in the context of languages, but that is not true, it simply refers to a language. --WikiTiki89 19:04, 2 August 2016 (UTC)
Right. {{lb|nb|language}} is a misuse of {{lb}}; labels indicate that a word is restricted in usage to a certain context, but I doubt that people only say armensk when talking about language/linguistics but not when talking about e.g. botany and mentioning in passing that a certain cited book was translated from armensk. The usual and proper thing to do to add a list category is just add the category manually or via {{C}} (see how it's used on French letter). - -sche (discuss) 19:05, 2 August 2016 (UTC)
Additionally, {{C}} also allows you to add multiple categories more easily, for example: {{C|en|Fruits|Trees|Pome fruits|Mythological plants}} at apple. --WikiTiki89 19:15, 2 August 2016 (UTC)
Normally, {{lb}} combined with language code and category will automatically link to a category, whether it's meant to or not, so I'm not sure that that can be classed as misuse. Admittedly I wasn't aware of {{C}} and I'm sure I will be making use of it now. I was also wanting to combine the functions of a qualifier with that of a label and category, but I obviously can't do that here. There shouldn't be any confusion with the armensk entry, adjective and noun are clearly separated; "en armensk forfatter", "en armensk bok", "armenske bøker", a book translated from armensk is obviously referring to Armenian, the language, not the adjective or the book itself. DonnanZ (talk) 20:01, 2 August 2016 (UTC)
I agree with -sche that {{lb|xx|language}} is a misuse. If the term is a language, then the definition should say so. Context labels should not be used to say what the definition should, and certainly not just to categorise. Categorising should always be secondary to the label; something that comes as part of using the label, rather than a reason to use the label in the first place. Perhaps we should start placing restrictions on labels to combat misuse. —CodeCat 20:23, 2 August 2016 (UTC)
I wouldn't class that as one of your most momentous ideas. Use of {{lb|xx|category}} is a good shortcut if used properly, and doesn't cause any harm at all. DonnanZ (talk) 21:08, 2 August 2016 (UTC)
It does cause harm. Let me give an example with one of our most commonly misused labels "anatomy". The "anatomy" label for the term glomerulus is justified because no who doesn't know anatomy would know what that is, so anatomy is a context in which this term is understood. However, it would be a misuse to put an "anatomy" label for the term kneecap, because everyone knows what a kneecap is and the word can be expected to be understood in practically any context (if you say "I feel and hurt my kneecap", you are not having a discussion about anatomy). Thus, putting the "anatomy" label at kneecap would mislead people to think that it is as much a technical term as glomerulus and that would be harmful. --WikiTiki89 14:50, 3 August 2016 (UTC)
You're calling it misuse, but obviously not everyone agrees with you, if the translations are anything to go by. Some use Category:Anatomy for kneecap, others Category:Skeleton. But that's so-called "misuse" of a category, not of {{lb}}. DonnanZ (talk) 17:31, 3 August 2016 (UTC)
As I just said, "anatomy" is one of our most commonly misused labels. Also, you are confusing categories with labels. The label gives the context, the category just adds the term to a category so that it can be found by browsing the category. The fact that some labels also categorize is just a matter of convenience to not have to put both the label and the category, but that doesn't mean all categories should be given as a label. --WikiTiki89 20:08, 3 August 2016 (UTC)
  • Nope, no confusion. Whether all categories can be accessed via {{lb}} is seemingly another matter that I have no control over. It should be up to the editor's discretion whether they use {{Category|xx|category}}, {{C|xx|category}} or {{lb|xx|category}}, depending on circumstances, and shouldn't be deliberately restricted in this way. DonnanZ (talk) 23:16, 3 August 2016 (UTC)
    It's not deliberately restricted. {{lb}} is meant to add labels not categories. Some of these labels also categorize for convenience, but not all of them do and not all of the categories are textually equivalent to the label, thus there is no way to automatically categorize these labels. Every label that wants to categorize needs to be added to the module so that the module would know the name of the category to use for that label. --WikiTiki89 00:21, 4 August 2016 (UTC)
  • I realise that, but when a request for inclusion is declined, that becomes a deliberate restriction. DonnanZ (talk) 08:27, 4 August 2016 (UTC)
    If you give me an example of where you would use it, I would add it. But so far, I disagree with your use cases. It's important to have a real example in order to actually identify the correct category, and whether we should redirect the label to "linguistics", and things like that. --WikiTiki89 14:56, 4 August 2016 (UTC)
  • No, I'm not confusing languages with linguistics; languages are always categorised as such, linguistics covers related matters, and I wouldn't use the languages label for anything other than actual languages. Most if not all languages in Norwegian have the same spelling as the adjective, which happens in English too. Therefore it would be useful to use a label {{lb|xx|language}} for the language entry. I have already mentioned armensk, other examples are fransk, tysk, japansk, spansk, portugisisk and so on. It's no big deal, but it would be a great convenience, a clear marking and pretty harmless. DonnanZ (talk) 23:11, 4 August 2016 (UTC)
    • But "languages" is not a context. These words are not only used in the context of talking about languages. They're used generally, without context. If I ask "Do you speak tysk?", people will understand regardless of what was being discussed before, and regardless of setting. Therefore the label "languages" is a misuse on these entries. Labels should not, ever, be used to clarify or disambiguate definitions. If the definition by itself is unclear, that's what you'd use a gloss for: {{gloss|language}} after the definition. —CodeCat 23:42, 4 August 2016 (UTC)
  • I give up, some people like to make mountains out of molehills. I don't particularly like {{gloss}} anyway, the note is not in italics. DonnanZ (talk) 10:15, 5 August 2016 (UTC)
    • You mean you wanted to define armensk as "(language) Armenian"? I don't understand what you mean, but the others are right that {{lb}} should not be used to generate a label "language" and categorize an entry. If you just want to categorize the entry without generating a label, use {{C}} or a simple category link. The use of context labels was voted and approved at Wiktionary:Votes/pl-2009-03/Context labels in ELE v2. The voted text says: "A context label identifies a definition which only applies in a restricted context." --Daniel Carrero (talk) 17:03, 8 August 2016 (UTC)

Too many pictures?[edit]

I wonder if we need a policy on where/when to use pictures in entries. For example, having a picture of a Bible at Bible makes sense, but we also have one at swear on a stack of Bibles. Ditto for on it like a car bonnet. I think that having pictures for purely figurative phrases is actually misleading (it might suggest that a real Bible, or a real car bonnet, is involved), and worse than not having them. I definitely don't feel that every entry, when finally completed, ought to have a picture. Only some entries (usually those for literal things, like moon or dog) benefit from them. Thoughts? Equinox 01:51, 3 August 2016 (UTC)

  • Yes, inappropriate images (as in the two you mention) could be removed without discussion (I have so removed). SemperBlotto (talk) 06:20, 3 August 2016 (UTC)
    • As well as that, there is no need to add an image to an entry if it is linked to a Wikipedia article the image is taken from. That's pointless. DonnanZ (talk) 08:04, 3 August 2016 (UTC)
      • I agree that some image removal, such as those mentioned, are clearcut. But what about marginal cases? Where should any necessary discussions of candidates for removal (and appeals or removals) be? Tea Room? Rfd? I don't think a new page is necessary now nor will it be ever. DCDuring TALK 11:01, 3 August 2016 (UTC)
      • It makes no sense to me that the mere presence of a Wikipedia link on an entry page would mean that that entry should not have any images. Granted, the Wikipedia article may have tons of images -- but how is that relevant to the content of the Wiktionary entry? A relevant and appropriate (set of) image(s) in the Wiktionary entry increases the utility of the entry. Requiring the user to click through to some other page entirely is not good usability. ‑‑ Eiríkr Útlendi │Tala við mig 20:03, 3 August 2016 (UTC)

Sorry to interrupt. This discussion is interesting. In French Wiktionary we have 26.406 pictures and we think we need more!. Do you know how much pictures are used in English Wiktionary? Noé (talk) 14:12, 3 August 2016 (UTC)

  • Hopefully someone can answer your question, I must add that I love images and have added a few myself, but I think it has to be done in an intelligent manner. DonnanZ (talk) 14:25, 3 August 2016 (UTC)
  • I also think that we could do with more pictures. And a linked Wikipedia article with a picture is not good enough, for a couple of reasons: the Wikipedia article might change (and remove or change the picture), the image contained in the article could be at the bottom of the page, and there are cases where it is not obvious which sense the picture is actually meant to illustrate. Having the picture right in the Wiktionary entry is more convenient and fixes these problems. Traditionally most dictionaries (and we as editors, too!) are very focussed on words (somewhat understandably), so more entries with well chosen images would help to make us stand out. Jberkel (talk) 17:33, 3 August 2016 (UTC)
  • I grant that point ("the Wikipedia article might change (and remove or change the picture"). DonnanZ (talk) 15:01, 13 August 2016 (UTC)
  • Pictures are great, but they should make some kind of useful point, such as "It looks like this", "What's different about it this", "It got its name because of this", "It's important because of this," or "It can be found in these locations." (especially useful if we are looking for translations). One kind of image that isn't too helpful is a picture of a particular kind (eg. a Norway maple) of a thing (eg, a tree) that doesn't show the features that distinguish it from other kinds (eg, of maples or of trees). DCDuring TALK 18:47, 3 August 2016 (UTC)
  • Note that we have a project in Wikipedia called Wikigrenier, whose purpose is to photograph various common objects. See a list of pictures. Photographs such as those are good for dictionary articles, and it is possible to make requests. — Dakdada 08:34, 4 August 2016 (UTC)
  • I generally agree that some pictures are inappropriate. I also agree that pictures in figurative or abstact entries are generally suspect. Of course, there is that vast category of picture-deserving entries which is not under discussion and in which better picture coverage is welcome. --Dan Polansky (talk) 17:43, 12 August 2016 (UTC)
  • As an example, I tried to find applicable images to clarify the many different senses at ‎(sakura, cherry; cherry tree; cherry blossom; etc.). Many of these senses are easier to understand with a visual. ‑‑ Eiríkr Útlendi │Tala við mig 18:40, 12 August 2016 (UTC)
    • That's great, but looks a bit crowded and the markup is confusing, with the inline table. Are there any image related templates in use? Maybe this would help to standardize the use of images, and we could keep track of which entries have illustrations. Jberkel (talk) 19:32, 12 August 2016 (UTC)
  • I looked in the past (possibly when working on that very entry), and I didn't find anything that did what I needed. If anyone is aware of such a template, I'm certainly game to use it. ‑‑ Eiríkr Útlendi │Tala við mig 20:31, 15 August 2016 (UTC)
  • A list and count of images used, as mentioned above by User:Noé may be a useful tool, if this doesn't exist already. DonnanZ (talk) 15:01, 13 August 2016 (UTC)
  • Just wanted to say that I have been adding images to entries, primarily as part of a general effort to improve entries that appear as Words of the Day. Personally, I don't think there's anything wrong with images that are not strictly descriptive. — SMUconlaw (talk) 17:00, 13 August 2016 (UTC)
  • It's quite pleasing when one finds suitable images on Wikimedia Commons that haven't been used anywhere else before, as for hopper wagon. DonnanZ (talk) 18:15, 13 August 2016 (UTC)
  • Did an analysis of the 20160801 dump with the help of Lyokoï (who provided the numbers for the French Wiktionary): we have 35927 image links. – Jberkel (talk) 20:38, 19 August 2016 (UTC)
  • Very interesting, thanks for the figure. More than in French, apparently. DonnanZ (talk) 20:54, 19 August 2016 (UTC)
  • Yeah, just for short time... We will change this fact quickly ! Haha ! --Lyokoï (talk) 11:46, 20 August 2016 (UTC)
  • That's the spirit! DonnanZ (talk) 18:24, 20 August 2016 (UTC)

To make it clear, if I remember properly, 35927 is not how many pictures are in en.Wiktionary but the number of pages that include at least one picture. Maybe I am wrong, but it is how I remember the math was. So, it is already pretty cool! Good job fellow! Noé (talk) 10:33, 31 August 2016 (UTC)

No, it's the absolute number of images (counted by searching for [[Image:]] and [[File:]]). – Jberkel (talk) 23:44, 31 August 2016 (UTC)

Proposed creation of Module:it-IPA[edit]

I was wondering whether it were possible to create a similar module to this one in the Catalan Wiktionary, but more complex; that is:

  1. the apostrophe read as completely absent;
  2. monosyllables only stressed when spelled with an accent;
  3. words treated separately if a space is put between them;
  4. two distinct IPAs, a phonetic and a phonemic one;
  5. the possibility of endless alternative pronunciations.

If anyone is willing to help me, please let me know. Thanks! ;) [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 15:33, 3 August 2016 (UTC)

There's a Module:ca-IPA already, but it was never deployed. Maybe you can do so, and adapt it to Italian as well? —CodeCat 15:41, 3 August 2016 (UTC)
@CodeCat: the fact is I’m not able to create modules, that’s why I was asking for help. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 16:37, 3 August 2016 (UTC)
@IvanScrooge98 I'm willing to help out. Is the idea to automatically create IPA transcriptions? Jberkel (talk) 17:38, 3 August 2016 (UTC)

@Jberkel: thank you so much; yes, basically, that’s the idea. Can you arrange that? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 17:43, 3 August 2016 (UTC)

OK, what I would need are some examples of input where ca:Mòdul:it-general produces the undesired output, with the expected output. It probably also makes sense to clean up / rewrite some of the code there – it's one big chain of regular expressions, a maintenance nightmare. Module:ca-IPA is a lot easier to understand and documented as well. Jberkel (talk) 21:05, 3 August 2016 (UTC)
OK, I’m working on it. I’ll let you have a list of examples. Thank you again, @Jberkel! [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 21:28, 3 August 2016 (UTC)

@Jberkel, here's a short list of comparisons. You will notice certain consonants geminated at the beginning of the word, that's a feature of Italian occurring, between two different words, after vowels only in these circumstances:

  • beginning /ʃ/, /ʎ/, /ɲ/, /dz/ /ts/ are always as double after vowels, even between two separate words;
  • all beginning consonants (with exceptions for certain clusters, namely cn, pn, ps, tm, tn, and all clusters starting with S which don't give /ʃ/, as st, spr, sc+a, etc.) undergo this gemination if they come after:
    • words ending with a graphically stressed vowel (as città, perché, però, giù, , etc.) or the list of stressed monosyllables which are spelled without accent that I provided you there (these monosyllables should display as though O were Ò /ɔ/, E were É /e/, I were Ì, etc...);
    • the unstressed monosyllables (with o = /o/, e = /e/) and the four words I provided.

All other unstressed monosyllables don't make the following consonant geminated; all monosyllables (including the geminating ones) should not display with a stress mark /ˈ/, not even by themselves, unless they are apocopic forms of nouns, etc. as ciel or cuor.
When it comes to secondary stress, I would just leave it to all stressed monosyllables unless very before the primarily stressed syllable, as in è vero; I would also put it in polysyllabic words if the distance with the primary stress is less than four syllables, otherwise I'd mark them with a normal stress mark; but you can choose to just leave one primary stress and all the others as secondary.
I think I didn't miss anything; hope I've been clear enough with these few words and that the task won't be hard for you; in any case, if you have any doubts or didn't understand something in my explanation, don't hesitate to ask me clarifications. Enjoy your work!! [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 15:31, 5 August 2016 (UTC)

OK, That should be enough to get going. I started to learn a bit of Italian recently, and the pronunciation seemed to be quite straightforward compared to other languages. – Jberkel (talk) 13:33, 6 August 2016 (UTC)
How would the difference between high-mid and low-mid vowels be handled? Presumably you need to write é or è, ó or ò? What happens if someone leaves out the stress mark (is this an error)? Benwing2 (talk) 21:26, 7 August 2016 (UTC)
@Benwing2: in the module on Catalan Wiktionary there are already some base rules to guess close-mid E and O and proparoxytones, the others are considered paroxytones with open-mid vowels. However, there’s no general rule in Italian and that’s just tentative, pronunciations have to be checked and overwritten if it is the case. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔt̪ːo] (parla con me) 09:03, 8 August 2016 (UTC)
For {{ca-IPA}}, you have to specify the vowel, it shows an error if you don't. See gos. —CodeCat 11:00, 8 August 2016 (UTC)

Abbreviations L4 header[edit]

I'm doing some automated fixup on Dutch entries (per WT:NORM and WT:ELE) and came across the entry aanwijzend voornaamwoord. This entry uses an "Abbreviations" section under the part of speech. This section is not listed in WT:ELE as allowed, thus it's also not clear where the section should appear relative to the others. What is the proper fix here?

Note: This isn't about the use of "Abbreviation" as the part of speech, but rather the abbreviations section appearing under a POS header, and therefore describing abbreviations of the current term. —CodeCat 20:01, 3 August 2016 (UTC)

Change it to "Synonyms" and use {{qualifier|abbreviation}}. DTLHS (talk) 20:14, 3 August 2016 (UTC)
Is an abbreviation really a synonym, or an alternative form? I think it's more the latter. —CodeCat 20:27, 3 August 2016 (UTC)
This isn't a black-and-white question. --WikiTiki89 20:30, 3 August 2016 (UTC)
But it needs a black-and-white answer. —CodeCat 20:36, 3 August 2016 (UTC)
What I mean is it's a case-by-case question. Some abbreviations are best handled as alternative forms and some are best handled as synonyms. --WikiTiki89 20:37, 3 August 2016 (UTC)
And how would you decide which to use? DTLHS (talk) 20:41, 3 August 2016 (UTC)
How about using the Derived terms section instead of Synonyms or Alternative forms? --Panda10 (talk) 20:44, 3 August 2016 (UTC)
I don't would object to either the alternative forms or the synonyms headers as a home for these for Dutch terms. If one alternative is necessary to make life technically simpler, then who really cares about how users might interpret one rather than the other. DCDuring TALK 21:27, 3 August 2016 (UTC)

Wiktionarian skills list[edit]

Dear colleagues,

In February, I initiated in French Wiktionary a list of skills, of what we are doing on our project, what we have learned to do. It is not a guideline nor an Help page but roughly what can fill a CV with our empirical learning. After few months of improvement by other people, I am glad to inform you that I tried to translate it in English as Wiktionarian skills list! Yay my English is quite awful, but I imagine it can be improve collaboratively Face-smile.svg I hope it will be interesting for some of you, and I'll be happy to discuss and improve it with you! Noé (talk) 23:43, 3 August 2016 (UTC)

@Noé: I made an attempt to fix most of your English grammar mistakes: diff. Could you please check to make sure I didn't change the meaning of anything? --WikiTiki89 17:51, 4 August 2016 (UTC)
Thank you a lot, it is much clear now! I hope you enjoyed the reading! I changed back only one sentence about trademark, because I was thinking about the problem we have to describe objects when names are brands. We discussed a lot about this in French Wiktionary because of 3M Company. They sent a dozen of e-mails to one contributor that wrote scotch (fr:scotch) and post-it (fr:post-it) to indicate they are trademarks. Companies need to protect their brands against dilution and Wiktionaries have to decide of a policy on this, to explain clearly that Wiktionary is descriptive and not prescriptive, so we do not pretend to decide if this band are now commonly attested as substantive in a language. Well, long discussion, and I plan to translate our conclusions to English at some point. -- Noé (talk) 12:36, 5 August 2016 (UTC)

External links and references in WT:EL[edit]

These headers are given as level 4 headers in the "example entry", but these are actually level 3 headers that go below the parts of speech. In particular, References goes below Anagrams. Can this be fixed? —CodeCat 17:35, 4 August 2016 (UTC)

I fixed this without a vote. --Daniel Carrero (talk) 10:58, 7 August 2016 (UTC)

WT:EL says nothing about multiple etymologies[edit]

It is the longstanding practice that when there are multiple POS sections, each with their own etymology, then we have separate numbered etymology sections and POS sections are nested in each etymology section, at level 4 rather than level 3. But WT:EL seems to say nothing about this. In fact, the "example entry" shows a rather atypical entry layout, with multiple POS sections under a single etymology; in the vast majority of cases, separate POSes ought to have separate etymologies too. I would like to see this remedied, does anyone have proposals for a change? —CodeCat 17:42, 4 August 2016 (UTC)

I created this vote, and it failed in January 2016: Wiktionary:Votes/pl-2015-12/Headings. It had some issues as discussed by the voters. Like in the failed vote, I'd like to propose having a decent "Headings" section with a list of all headings and levels (Etymology = level 3, Noun/Adjective/etc. = level 3, Translations = level 4), explaining how the presence of "Etymology 1"/"Etymology 2"/etc. affects the level of other sections. --Daniel Carrero (talk) 06:53, 7 August 2016 (UTC)

WT:EL homophones and rhymes sections[edit]

These basically duplicate the content of Wiktionary:Pronunciation, which is already linked to on the page. Rather than try to elaborate on every detail of the Pronunciation section, WT:EL should stay short and to the point. I therefore propose to remove these two sections, perhaps replacing them with a sentence or two that mentions other things that go in Pronunciation sections. —CodeCat 17:58, 4 August 2016 (UTC)

If Wiktionary:Votes/pl-2016-07/Pronunciation 2 passes, as explained in the "changes and rationale", all the text in the subsections "Homophones" and "Rhymes" is going to be kept, albeit edited to occupy less space, and the subsection titles will disappear. The titles themselves are unnecessary, in my opinion. If the titles were to be kept, we might as well have titles: "Audio pronunciation", "Transcription", "Hyphenation", etc. --Daniel Carrero (talk) 06:45, 7 August 2016 (UTC)

WT:EL: "Language" under "Entry core"[edit]

The way WT:EL is currently laid out, it first mentions things that go before the definitions, then the "core" which includes the POS section and definitions themselves, and finally things that go after the definitions. But the "Language" section, which describes the use of L2 language sections, is not part of the entry core as implied here. Note also that it is nested under the "Additional headings" L2 section, which says "There are additional headings which you should include if possible, but if you don’t have the necessary expertise, resources or time, you have no obligation to add them, with the possible exception of “References”." I certainly don't think that the L2 language section is optional or dependent on expertise in any sense. Therefore it should be mentioned earlier on the page, and more prominently. —CodeCat 18:03, 4 August 2016 (UTC)

Some thoughts:
  • I think "Entry name" should be somewhere above "Language" and the explanation of any section.
  • Then "Language", above any explanations of basically everything else (etymologies, POS headers, definitions, etc.), because it is the highest-level section we use (not counting the H1 page title).
  • We could delete the titles "Headings before the definitions" and "Headings after the definitions" and just have "Headings".
  • "Additional headings" is a misnomer. It does not contain what the name promises, and the name or contents should change.
  • WT:EL seems to imply that "References" is mandatory. Is "References" mandatory? Why? If anything, any entry must have the language, POS header, headword line and at least 1 definition. (for the record, it was voted and approved that romanization entries must have a definition, too)
--Daniel Carrero (talk) 07:05, 7 August 2016 (UTC)

Variables extension[edit]

What exactly happened to Wiktionary:Votes/2015-12/Install Extension:Variables? On the phabricator (phab:T122934), plans were made to create a similar function, but after 6 months no progress has been made. -Xbony2 (talk) 21:17, 4 August 2016 (UTC)

My understanding is that the work required to make section-aware templates possible is dependent on some work to actually make the MediaWiki parser know what sections are (phab:T114072). As you can see, the Parsing team of the WMF is quite busy, so it looks like it may be some time before this work gets underway. I posted a rather hackish proof-of-concept patch for MediaWiki at phab:T122934, which would solve the problem, but there is practically zero chance of that being accepted - I think the Parsing team would prefer to do it the proper way, rather than introduce yet more technical debt into MediaWiki. This, that and the other (talk) 07:19, 7 August 2016 (UTC)

moisturising cream v. moisturizer[edit]

Moved to Wiktionary:Tea Room#moisturising cream v. moisturizer DCDuring TALK 10:57, 5 August 2016 (UTC)


How do I get the diacritic in phah-sǹg to display correctly in the title?--Prisencolin (talk) 00:06, 6 August 2016 (UTC)

It might be a font problem on your own computer. It shows up fine for me (Windows 10, Monobook skin). —suzukaze (tc) 00:14, 6 August 2016 (UTC)

Vote: Using template l to link to English entries from English entries[edit]

FYI, I created Wiktionary:Votes/2016-08/Using template l to link to English entries from English entries.

Let us postpone the vote as much as discussion makes necessary, if at all. --Dan Polansky (talk) 10:06, 6 August 2016 (UTC)

Alternative forms after definitions — weaker proposal[edit]

Previous discussions:

The vote Wiktionary:Votes/2016-02/Placement of "Alternative forms" had the proposal below. It ended as no consensus (10-9-1 = 52.6%-47.4%) in March 2016.

Voting on:

  • Fix the placement of the "Alternative forms" section directly above the "Synonyms" section, as a subsection of the POS section.


  • Arguably, synonyms and alternative forms are related concepts.
  • Removing "Alternative forms" from above the definitions is a way to promote the definitions.

Simplified entry example: hardworking



# Definition.

(possibly other headers between the definitions and the alternative forms)

====Alternative forms====
* {{l|en|hard-working}}

* {{l|en|industrious}}

Unfortunately, as mentioned by some voters, if this vote passed, it would have resulted in duplication of alternative forms sections in entries with multiple POS sections.

New, weaker proposal:

  • Rather than editing all entries (as in, by bot or whatever), just allowing entries to be edited on a case-by-case basis: If someone wants to edit an entry manually and place the "Alternative forms" as a L4 section above "Synonyms", that would be OK. If someone wants to edit an entry manually and place the "Alternative forms" as an L3 section above Etymology/Pronunciation, that would be OK too, and individual entries can be discussed in case of disagreement. This would need a new vote.

Pinging all participants of the previous vote (I hope I didn't miss anyone):

@Metaknowledge, Mr. Granger, Equinox, This, that and the other, -sche, Wikitiki89, Makaokalani, Embryomystic, Andrew Sheedy
@Droigheann, Nibiko, I'm so meta even this acronym, Vahagn Petrosyan, Dan Polansky, Xoristzatziki, Erutuon, Korn, Xbony2

Thoughts? --Daniel Carrero (talk) 08:06, 7 August 2016 (UTC)

I think this makes much more sense than placing them above, as though they apply to all terms. But they're not like pronunciation, where it actually makes sense to show it as a "global" thing that applies to the entire entry. Alternative forms are often term/etymology specific. —CodeCat 12:18, 7 August 2016 (UTC)
steden is an entry where the different etymologies have different alternative forms. —CodeCat 16:38, 7 August 2016 (UTC)
I doubt all alternate forms apply to all POS's of every word. Since they need to be attested for each POS independently, I would (now) have no problem repeating them for each POS. They should definitely be split up for separate etymologies. Andrew Sheedy (talk) 20:32, 7 August 2016 (UTC)
I agree with CodeCat (talkcontribs) here. Alternative forms are very much like synonyms and it makes no sense to stick them at the top where they'll often be missed. Benwing2 (talk) 21:08, 7 August 2016 (UTC)
As for duplication of L4 alternative forms, one alternative (so to speak) is to place them as an L3 header after both or all POS sections. I do this often with Related Terms. (Although I'll grant that it makes more sense to do this for related terms than for alternative forms as often this means no more than converting an L4 to an L3, whereas with alternative forms it will involve moving them below synonyms, antonyms, derived terms and related terms, and they may be missed there just like at the top). Benwing2 (talk) 21:11, 7 August 2016 (UTC)
I don't like this idea. I think moving away from having headers apply to multiple POS sections is the way to go. If we have to duplicate a few, then so be it. It's not that frequent. —CodeCat 21:40, 7 August 2016 (UTC)
I agree with CodeCat on this. --Daniel Carrero (talk) 15:09, 8 August 2016 (UTC)

Suggestion for sense tags on antonyms[edit]

Awhile ago CodeCat (talkcontribs) tried changing the text of the {{sense}} tag to say something like (of sense "foo") instead of just (foo). This was roundly disliked, and reverted. The logic given by CodeCat was that it's confusing to have a simple (foo) sense tag next to antonyms, which suggests that the antonyms has the meaning of the sense tag rather than the opposite. How about we do something like what CodeCat tried, but only for antonyms? It could say (of sense "foo") or (antonym of "foo") or similar. The way to implement it is to create a new template {{antsense}}, and use a bot to change all occurrences of {{sense}} in Antonyms sections to {{antsense}}. Thoughts? Benwing2 (talk) 21:17, 7 August 2016 (UTC)

I do think we should do something about this. Unfamiliar users fairly regularly invert the sense, thinking they are fixing an error. Equinox 21:44, 7 August 2016 (UTC)
How many dictionaries actually have antonyms (or any other semantic relations)? I think not many.
How do references of any kind that have antonyms handle this? Among OneLook references, WordNet and Collins Thesaurus have antonyms, which are published online by The Free Dictionary. They offer two presentations (using dark as an example):
  1. the freedictionary, which uses color-coded icons in red () and green ().
  2. the freethesaurus, which is new and uses color coded boxes, pale green for synonyms, pale red for antonyms, peach(?) for "related words".
Color-coding is imperfect (blindness, red-green color-blindness, monitors or screens that do not show colors).
The icons alone don't seem adequate for the full range of users who need the current approach supplemented or replaced.
Longmans DCE 1985 includes "—opposite light" on the appropriate sense line.
Webster's 2nd Intl. has "syn." heading a block of text explaining synonyms and (mostly) near-synonyms and "ant." before a very short list of antonyms.
What does OED do? DCDuring TALK 23:50, 7 August 2016 (UTC)
OneLook itself has a Thesaurus, which uses color coding in the manner of freethesaurus. DCDuring TALK 23:53, 7 August 2016 (UTC)
Chambers Thesaurus screenshot: [1]. They divide each entry into sections headed by examples (dark hair, dark secrets, etc.) and collect all antonyms at the end, numbered by section and marked by the inequality sign . Equinox 23:56, 7 August 2016 (UTC)
That's sort of what we do, except we label senses directly rather than by number, because numbers tend to change as senses are added and rearranged. —CodeCat 00:12, 8 August 2016 (UTC)
On my screen Chambers entry shows the antonyms in red type.
This use of the inequality symbol to mark antonyms doesn't seem obvious though it is quickly learned. DCDuring TALK 04:02, 8 August 2016 (UTC)
We certainly should not use colour exclusively to convey information, only colour in addition to something else. —CodeCat 18:13, 8 August 2016 (UTC)
My screenshot came from the CD-ROM version. I assume they use that symbol to save visual space. Equinox 20:18, 8 August 2016 (UTC)

Is Ushakov's dictionary copyrighted still?[edit]

Calling @bd2412. See Copyright law of the Russian Federation. This case is tricky because Ushakov's dictionary was published in 1935-1940 and he died April 17, 1942 (see Dmitry Ushakov). The copyright law of 1993 retroactively made a copyright of 50 years after the published date or the author's death (whichever is later), and later works extended this to a 70-year term. The Wikipedia article says this means anyone who died in 1943 or later was within the copyright period in 1993, but various additional details might possibly make Ushakov's work within this period as well, in which case the copyright would extend (presumably) to 2013, meaning it's (presumably) out of copyright now. But this stuff is sufficiently complicated that I don't know for sure. Basically I want to use some example sentences from this dictionary to illustrate some Wiktionary entries. Benwing2 (talk) 02:52, 8 August 2016 (UTC)

Possibly relevant: this discussion of fair use and de minimis copying, and of the applicability of US vs non-US laws. - -sche (discuss) 05:35, 8 August 2016 (UTC)
For loading on Commons, you have to follow the source country's copyright and US copyright, but Wiktionary doesn't have to follow source country's copyright by WMF rules, and I don't think en.Wiktionary has policy on it. If it was out of copyright in Russia in 1996, it's almost certainly out of copyright in the US, but if it was in copyright in Russia in 1996, it will be in copyright in the US for 95 years from publication, or until 2031-2036.
I'd note that fair use on stuff taken from a dictionary is going to be much more problematic than on stuff taken from a novel, since quotations from a novel don't influence the normal commercial use of the novel, but we are directly competing against a dictionary.--Prosfilaes (talk) 07:33, 8 August 2016 (UTC)
What kind of damages can they claim, though? How much profit do they still make? Also, I can't imagine it is the case that works are not free of copyright until they are in every country in the world. And retroactively copyrighting seems even more dubious, what if someone had published it under a permissive licence in the meantime? Do non-infringing works suddenly become infringing? —CodeCat 18:16, 8 August 2016 (UTC)
@Prosfilaes This stuff is such a mess. It seems quite possible that it went out of copyright in Russia, went back into copyright in 1993 (conceivably due to a rule stating that dates are moved forward to Jan 1 of the next year), went out again later that year (50 years from author's death, moved forward to Jan 1 1943???), hence was out of copyright in 1996, then went back into copyright in 2004 due to the new 70-year-from-death policy, then went out again in 2013. Presumably that means it's out of copyright in the US. But who knows. What exactly happens if you copy from an out-of-copyright work and then it later goes back into copyright? This stuff sucks. Copyright terms IMO are way way too long. Benwing2 (talk) 00:11, 9 August 2016 (UTC)
@CodeCat: They can claim up to $30,000 as statuary damages in the US. Works are not free of copyright everywhere in the world until they are free of copyright everywhere in the world. The WMF is chartered in the US, and therefore has to follow US rules. There are some countries where the rule of the shorter term is in play, and thus lack of copyright in Russia matters there (which is part of the reason Commons cares about it), but the US doesn't have the rule of the shorter term. Putting a work in the public domain back in copyright is a mess, but countries do it some times, usually with exceptions for preexisting users.
@Benwing2: It was never in copyright in the US, and the URAA in 1996 would have returned it to copyright in the US only if it was still in copyright in Russia. It looks like it's out of copyright close to world-round, so it should be safe to use. As far as I know, copyright terms virtually always extend through the end of the year they expire in.--Prosfilaes (talk) 08:03, 9 August 2016 (UTC)
Thanks! Benwing2 (talk) 08:33, 9 August 2016 (UTC)


Whatamidoing (WMF) (talk) 18:02, 9 August 2016 (UTC)

Old Ruthenian[edit]

I'm wondering how we should handle this language. Should we give it its own code and make it a descendent of Old East Slavic and the ancestor of Ukrainian, Belarusian, and Rusyn, or should we make it a dialect of Old East Slavic, or even a dialect of Russian? What should we call it, Ruthenian, Old Ruthenian, Old Western Russian, Lithuanian Russian, etc.? What code should we give it? --WikiTiki89 19:33, 9 August 2016 (UTC)

@CodeCat, Atitarev, Useigor, -sche: Pinging people who might be interested. --WikiTiki89 15:58, 10 August 2016 (UTC)
It's mainly about how different they are. Is Old Ruthenian clearly identifiable as a language contrasting with Old East Slavic? —CodeCat 16:06, 10 August 2016 (UTC)
What language do we consider having been spoken in the Grand Duchy of Moscow and the Tsardom of Russia? Old East Slavic, or Modern Russian? If the answer is Old East Slavic, then we can consider the language of the Grand Duchy of Lithuania to also have been a dialect of Old East Slavic; if the answer is Modern Russian, then we would need to make it a separate language. --WikiTiki89 17:35, 10 August 2016 (UTC)
Russian and Ruthenian probably diverged by the 15th century. Old East Slavic (Old Russian) is the predecessor of both. Church Slavonic was used as the official language of Muscovy then. --Anatoli T. (обсудить/вклад) 12:03, 12 August 2016 (UTC)
If we add it, I would call it "Old Ruthenian", because "Ruthenian" seems too ambiguous. "Lithuanian Russian" also seems ambiguous and has been less common since the 1980s (per ngrams). Looking around cursorily, I do find scholars who consider Old Ruthenian distinct from Old East Slavic — some consider Old Ruthenian a jumble of Old East Slavic elements and Polish ones. It makes it sound like it would be possible to tell whether a given text was Old Ruthenian or Old East Slavic, which is an obvious prerequisite to splitting it. - -sche (discuss) 09:30, 13 August 2016 (UTC)

Involved administrator actions[edit]

Greetings. I would like to ask a question about Wiktionary policies regarding using administrative tools in situations where the admin is "involved" in a dispute. I am not sure if I am in the right place, and if not, could you please direct me to the appropriate venue?

If I am in the right place, here's the situation. The etymology section at sheng nu has been contested for quite some time. Both at the deletion discussion and subsequently the talk page. The first revert was by User:Wyang with the edit summary, "Western fantasisation". The content was sourced by reliable sources including the BBC and the New York Times. This was way back in 2013.

In 2015 I re-added the content back because it had reliable sources and it was discussed at the talk page. The conversation ended with me asking them for reliable sources that place the etymology elsewhere otherwise it's being removed purely on personal opinion and original research. The topic was seemingly dropped and the content remained.

Then in July 2016, Wyang reverted it again. I came across it today and re-added it and then made some major changes to the etymology section. Namely added that the etymology is disputed as described in a book I cited, and then proceeded to list the varying origins for the term as cited by the various reliable sources. Wyang reverted my edits with new changes without an explanation. I left a talk page message and restored my new changes. I was reverted almost immediately, and then to my surprise, Wyang protected the article so that only administrators could edit it.

Maybe Wiktionary allows for the removal of cited etymology content. Fine, but Wyang never provided anything other than original research. Even if they did have sources that indicated a different etymology, they could have added it to the section as one of the alternate origins. All of this as far as I'm concerned is just a simple editing dispute between two editors, but I was very surprised to see administrative tools used to essentially levy the argument into one direction. I'm not very familiar with Wiktionary policies, but as an admin over at the English Wikipedia, we are expressly prohibited from using our administrative tools in arguments and disputes we are involved in.

Any advice is appreciated. Will totally drop this if this is the custom here. Mkdw (talk) 22:30, 9 August 2016 (UTC)

Wyang did the same to me, with a widely-used module, and I'm also an admin. So you're not the only one. —CodeCat 22:33, 9 August 2016 (UTC)
@CodeCat Do you think it was OK to make a widely used module for Thai transliterations and transcriptions unusable, affecting thousands of entries, upsetting all Thai editors and not really giving a working alternative just because you didn't like the methods used? Please don't mention this in unrelated discussions. Sorry, I don't support you in that. --Anatoli T. (обсудить/вклад) 23:51, 9 August 2016 (UTC)
Please don't misrepresent the problem. The module was not made unusable once Wikitiki had provided an alternative. Those edits were reverted by Wyang. —CodeCat 23:53, 9 August 2016 (UTC)
Wikitiki tried to help but he wasn't sure himself it was working correctly and did what was expected. Wyang gave reasons why. --Anatoli T. (обсудить/вклад) 23:57, 9 August 2016 (UTC)
Wyang was wrong. The fixes did work. I repeatedly asked him to give examples of entries that were broken by Wikitiki's edits. He never gave any. There was no reason to revert the fixes, especially not when they re-created the problem he accused me of creating. Since the edit war, things have been left in a semi-broken state, I'm afraid to try fixing them again for fear of another edit war. I would like a guarantee that it will not happen. —CodeCat 00:00, 10 August 2016 (UTC)
Doesn't seem very collaborative. We didn't even try dispute resolution. Does Wiktionary have a formal process for reporting administrator abuse of the tools? Mkdw (talk) 22:47, 9 August 2016 (UTC)
Bringing up old grievances in unrelated discussions- when you look in the mirror you should be seeing Dan Polansky right now... Chuck Entz (talk) 14:19, 10 August 2016 (UTC)
I failed to see any evidence of substance for your claim (reputable Chinese sources, announcements by the All-China Women's Federation or the Ministry of Education). Unreliable Western media claims should be removed if no original sources can be found. Wyang (talk) 23:03, 9 August 2016 (UTC)
The New York Times, BBC, and the Huffington Post among other sources were provided. In addition, I also included a source from the China Daily, South China Morning Post, and a book by Sandy To. If you believe these sources are "unreliable" that is your personal opinion but is directly in line with WikiMedia Foundation policies on reliable sources. Further to, you have failed to provide any sources of your own to support your theory, and even if you had sources, you should have expanded the etymology section to include these other origin explanations. I already added a source that says the etymology is disputed. Lastly, indefinitely protecting the article is prohibited as an abuse of your administrative privileges. Mkdw (talk) 23:11, 9 August 2016 (UTC)
The Wikimedia Foundation has no policies on reliable sources. It is entirely up to the individual sites. DTLHS (talk) 23:14, 9 August 2016 (UTC)
None of these sources makes sense.
"The China Daily reported in 2011 that Xu Wei, the editor-in-chief of the Cosmopolitan Magazine China, coined the term."
This is obviously false (Citations:剩女).
"Chiu, Joanna (04 March 2013). Unlucky in love … or just left out of the market?. South China Morning Post. Retrieved 9 August 2016." is the reference cited for "The term was added to the national lexicon in 2007 and widely popularized by the All-china Women's Federation."
No such claim was found in the actual article.
Wyang (talk) 23:16, 9 August 2016 (UTC)
To repeat my stance from the argument with CodeCat: I consider the application of admin power to either prevent or allow editing in an edit war in which the admin is involved to be misuse of such power, even when the admin in question is correct. I would ask all administrators involved in edit wars to turn to one of their colleagues as a manager of such situations. Korn [kʰũːɘ̃n] (talk) 23:34, 9 August 2016 (UTC)
I would like to petition the community here to unlock the entry sheng nu. Wyang protected the article indefinitely not to prevent vandalism or harm, but to simply enforce their editorial position. As for the editorial dispute, Wiktionary has processes in place such as dispute resolution to which I am a willing participant. Mkdw (talk) 23:38, 9 August 2016 (UTC)
There is no point of petitioning if there is effectively no basis for your claims - the content you added misattributes content from references, or is apparently factually incorrect. Wyang (talk) 23:47, 9 August 2016 (UTC)
"The very first origins of the term sheng nu have been much contested, and it is virtually impossible to find out exactly whena nd who first coined the term, be it television dramas, talk show hoests, magazine articles, or academic circles. But the most significant aspect of the 2007 official definition that has been endorsed by the Chinese government, and continuously propagated by the government-run All-China Women's Federation"
To, S. (2015). China’s Leftover Women: Late Marriage among Professional Women and its Consequences. Oxford; New York: Routledge.
"The term refers to any unmarried Chinese woman over the tender age of 27, and was coined by the All-China Women's Federation"
Tunstall, Lee (15 November 2012). Are All the Single Ladies Really Like the Oil Sands?. The Huffington Post. Retrieved 2 April 2013.
"State-run media started using the term "sheng nu" in 2007. "
Magistad, Mary Kay (20 February 2013). "BBC News - China's 'leftover women', unmarried at 27". BBC News (Beijing). Retrieved 9 August 2016.
"According to The New York Times, the term was made popular by the All-China Women’s Federation in 2007"
MacLeod, Duncan (11 April 2016). "Marriage Market Takeover for Leftover Women". Inspiration Room. Retrieved 9 August 2016.
"The term “leftover women” surfaced in 2007 in a report by the All-China Women’s Federation, a state agency whose professed purpose is to “protect women’s rights and interests.”"
Reynolds, Christopher (18 April 2016). "Viral video inspires China's 'leftover women'". Toronto Star. Retrieved 9 August 2016.
"The term "sheng nu" was first used by the All-China Women's Federation (founded by the Communist party in 1949) in 2007, to explain that a leftover is an unmarried woman over the age of 27."
Iaccino, Ludovica (31 January 2014). "Single and Educated: the Problem of China's 'Leftover' Women". International Business Times. Retrieved 9 August 2016.
The lists of available references goes on and on. The other points you brought out were originally used to cite the sentence, "pressure unwed women into marriage", but you reverted my changes before I was complete. Regardless of whether you think my arguments about sources have merits, it does not exclude you from abusing your administrative tools, nor does it warrant engaging in an edit war. Mkdw (talk) 23:54, 9 August 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I agree with Korn that a page should not be protected to enforce an editorial position. Unless someone (other than Wyang) objects, I'll unprotect it. Benwing2 (talk) 00:45, 10 August 2016 (UTC)

I support unprotecting the page. I do think more discussion is needed before further edits are made, but they should involve more parties than just Mkdw and Wyang. —CodeCat 00:51, 10 August 2016 (UTC)
Unbelievable. Repeatedly adding unsubstantiated material amounts to vandalism, warranting a block already. This is especially true considering it has been more than three years since I had asked for direct evidence for the claim, and there is none. In this case the editing patterns of the user Mkdw shows that he/she clearly has an agenda with the edits: using fantasy-driven Western media articles misconstruing the actual linguistic scenario in China to push for a point of view, i.e. the viewpoint that the Chinese culture is distorted in the Western eyes - there are words coined by the "All-China Women's Federation" which pejoratively refer to unmarried Chinese women over 25 as "leftover women". This is nonsense. If person 1 writes that "A claims B did something", then it is person 1's task to be able to provide direct evidence that B did something, especially when someone considers A's claim unreliable. If person 1 cannot do so, the claim should be promptly removed. Wyang (talk) 01:18, 10 August 2016 (UTC)
(After edit conflict...)
  • Looking on from outside the argument and sussing out the details, I feel compelled to chime in.
Re: the origins of the term, just looking at Citations:剩女, I see that this term was prima facie not coined by the All-China Women's Federation in 2007 -- all five citations currently on that page are older than 2007: 1964, 1992, 1995, 2002, 2006. Past that, relying on English-language sources to divine the etymology of a Chinese term does not strike me as a wholly viable approach. My field is Japanese, and I've run across numerous instances of English-language sources claiming this or that about a Japanese term, when reliable and respected native-language sources say something else entirely. Relying on mass-media sources is even less viable -- their business is to sell copies, and they do that by printing interesting content, often without much regard to strict veracity.
Re: sources, finding a citation of a term in use is enough to meet our criteria for inclusion, vaguely analogous to Wikipedia's "notability" requirement. But when it comes to the content of an entry, it is not enough that a given source says X or Y: we also pay attention to the identity, reputation, and expertise of sources. As a thought experiment, I wouldn't care one whit if you found that the New York Times itself claimed that the Japanese word gaijin (“outsider, foreigner”) originally came from Hebrew גויים ‎(goyim) -- unless that also agreed with known Japanese sources that make the same claim.
Re: edit warring, it bears noting that Wiktionary's editor base is much smaller than Wikipedia's. We neither need, nor can we use, the kind of bureaucracy that has evolved on Wikipedia. Given also that the number of editors for any given language is much smaller than the total number of Wiktionary editors, we must often rely upon the judgment and expertise of the very small number of people who handle the day-to-day process of maintaining our content. Your edit history (23K+ on Wikipedia, 129 or so here on Wiktionary) and some of the background threads (as at Talk:sheng_nu) suggest that you're well-versed in Wikipedia's culture and way of doing things, but not so much in Wiktionary's.
Ultimately, considering that Wyang is a native speaker of Chinese, can read Chinese source materials, and has a long history of high-quality work on Chinese entries here, I'm much more inclined to trust his judgment over yours, when it comes to the origins of Chinese terms. You discount him entirely by merely posting English-language sources, many of zero etymologic value, and claiming that the burden is on him when he's asking you for reputable Chinese sources backing your claims.
I haven't agreed with everything that Wyang has done, but in this case, it does appear that he is more in the right on the etymology of English sheng nuChinese 剩女. ‑‑ Eiríkr Útlendi │Tala við mig 01:21, 10 August 2016 (UTC)
I admit I discounted Wyang when they removed content under the rationale "Western fantasisation". That indicates to me an unreasonable bias. It doesn't matter how Wiktionary treats original research. Any opinion needs to be supported by something otherwise it's simply an opinion. Here is what was removed:
The exact etymology of the term is disputed.[2] The China Daily reported in 2011 that Xu Wei, the editor-in-chief of the Cosmopolitan Magazine China, coined the term.[3] Other sources have indicated the All-China Women's Federation and the Ministry of Education of the People's Republic of China.[1][4] The term was added to the national lexicon in 2007 and widely popularized by the All-china Women's Federation.[1][5][6]
The first two citations at Citations:剩女 seemingly use the term to refer to people who remained after the war. This does not explain the etymology of the term sheng nu as defined by the Chinese lexicon, an unmarried women in their late twenties. It suggests that leftover and woman (and man in this case) were put together not as a term but as a turn of phrase. The same goes for the 2002 citation that talks about food. Suggesting these are the origins for the term about unmarried women seems unlikely because there is nothing supporting the finding of these words together to the term or an evolutionary process. Where I think Wyang's ability to read and write Chinese could be useful is not conducting their own original research, but finding Chinese sources that tie any one of their citations as being the origin of the term. In the meantime, the sources we do have are all we have. Wrong? Possibly. Sourced and not original research? Yes. I would have even settled for "The exact etymology of the term is disputed.[2] The term was added to the national lexicon in 2007 and widely popularized by the All-china Women's Federation.[1][5][6]" but that never seemed to be on the table either because Wyang always resorted to wholesale reverts.
I won't even get into all the problems Wiktionary invites by allowing original research to prevail over published material (including books). Or by allowing admins to use their tools to resolve editorial disputes. You're right, I know Wikipedia and Wiktionary is different, but I cannot see how Wiktionary hopes to grow their community and welcome newcomers when special privilege and rules are applied to a select few.
Lastly, so quickly labelling me as a disruptive editor is only evidence of that. I'm not here pushing some fringe idea. I was adding what I've found in the sources. I have adjusted the content as new sources have come up as evident in my last series of edits. It allowed for multiple explanations and acknowledged the origin is disputed. I was seeking compromise. I was using the talk page until Wyang stopped replying. I have even mentioned an openness to dispute resolution. Mkdw (talk) 02:45, 10 August 2016 (UTC)
One point that hasn't been made yet: your sources are adequate for an encyclopedia article, but this is an etymology, which requires a different skill set than most journalists have. There was an article in a non-linguistic journal that made assumptions about prehistoric human culture based on long-range linguistic reconstructions that are considered by most linguists to be way out on the fringe, and there were all kinds of articles in mainstream journalistic sources that treated this without any skepticism at all. Chuck Entz (talk) 14:19, 10 August 2016 (UTC)
  • Thank you, Chuck -- that was the point I tried to make earlier, that mass-media sources are of exceedingly low value when it comes to etymologies. Your restatement is clearer. ‑‑ Eiríkr Útlendi │Tala við mig 23:18, 10 August 2016 (UTC)
  • @Benwing2, I'm with Wyang here -- Mkdw really does not seem to get it, but he keeps pushing his position. In Wyang's place, I might have done the same thing: with few Chinese editors to collaborate with, he's been one of the most active Chinese editors for a while now (at least, from what I've seen). If we unprotect the page, what do we do if Mkdw keeps adding low-value English-language "sources"? What other approach would you all advocate? Should we block disruptive users, rather than locking disrupted pages? Serious questions, BTW, I'm not being rhetorical. ‑‑ Eiríkr Útlendi │Tala við mig 01:25, 10 August 2016 (UTC)
OK, I'll leave it alone now. I still believe that it's an abuse of admin powers to lock a page over editorial disagreements, even if the admin is almost certainly correct, as long as the other user is apparently acting in good faith and is willing to respect the dispute process (which for us would probably be a Tea Room discussion); but Eirik you make good points. Benwing2 (talk) 02:31, 10 August 2016 (UTC)
This sets a terrible precedent on principal alone. This creates a de facto community endorsement whereby administrators can revert an editor and then protect the entry indefinitely to enforce their editorial preference in a situation, even if the editor is willing to go to dispute resolution, using the talk page, and changing their edits to find a compromise -- provided the admin feels like they're "right". I'm admittedly very disappointed but I strongly believe in community consensus. If the community consensus here and now is to endorse this action, then I withdraw my request and accept it under protest. The editors at Wiktionary have the right to their own self determination regarding their practices. It's unfortunate that this type of conduct is being endorsed rather than, say, a community consensus possibly endorsing Wyang's editorial position, if they deemed so, but also finding Wyang's administrative actions inappropriate. I would have accepted it and the community (including administrators) would have had recourse against any editor (including myself) as going against the consensus. I would think that would then be deemed a disruptive editor, but seemingly the threshold is way less than that. Maybe there's no appetite for bureaucracy but there must be checks and balances for administrators and this is one step in the opposite direction. Mkdw (talk) 02:51, 10 August 2016 (UTC)

Why are you all walking into a smokescreen now? This is not a debate about the origins of the phrase sheng nü, this is a debate about whether admins should get to abuse their goddamn power. The edits made were both good faith and sourced with media which is generally accepted to be decent, and as such as proper as any Wiki-edit can get. They can be wrong a hundred times over and Wyang can be right a hundred times over, using his superior power in an edit war, not even to stall the warring until a consensus was reached, mind you, but to allow him to forego the argument is crass abuse and should not, for any reason, be tolerated. This should be a place where arguments should be won by superior evidence, not by sucker-punching your opponent with your superior admin-muscle. In the same vein, and you damn well know whom I am looking at, this is a place where arguments should be carried out in civilised debate and not by playing volleyball with a page because nobody can stop you. If you're part of the argument, you don't get to be judge or police, that should be a very simple rule we can all agree on. While I have no doubt that Wyang is an absolute treasure as an editor on Asian languages, the fact that he's involved in two such situations within a short period of time doesn't give me the best impression of him as an admin and the fact that an outsider now has reached the conclusion that we as a whole condone such abuse should shame us all. Korn [kʰũːɘ̃n] (talk) 10:42, 10 August 2016 (UTC)

Well, Wiktionary is not like Wikipedia. A word can be demonstrated to exist at a point in time simply by showing attestations of the word at that time point, and there is no point citing an external article claiming a word was coined in 2011 while there are ample attestations for its use long prior to that. This is exactly what User:Mkdw did not get and what he/she had been trying to do repetitively over the years, including today again. It has been made clear that there are reasonable doubts regarding his edits more than three years ago, but he ignored the comments and the attestations I have gathered at Citations:剩女 to continue pushing for his POV edits. It is apparent they are trying to match the Wiktionary content to the stuff ("Good Article") they have written over at Wikipedia, with complete disregard for criticism and the linguistic facts. This is vandalism and should be dealt with as such. Wyang (talk) 11:57, 10 August 2016 (UTC)
In my book, it's not vandalism as long as it's done in good faith. But no matter whether it is: Even if there is an edit war involving one steadfast defender of the right thing and one plain vandal, then yes, the vandal has to be dealt with, but by a neutral third who has heard both sides, not by one side of the edit war. That is all I'm saying. Korn [kʰũːɘ̃n] (talk) 13:20, 10 August 2016 (UTC)
  • I think the first round of back-and-forth edits in 2013 counts as good faith. Mkdw came back just a couple days ago and added essentially the same content, completely failing to accept or respond to the past argument that the content he was adding was from sources of clearly contestable value. After Wyang made it clear that Mkdw's sources were still inadequate, Mkdw continued to insist -- he again refused to acknowledge the possibility that just having a source isn't enough here on Wiktionary. This is where I start to view Mkdw's edits as not in good faith any more. ‑‑ Eiríkr Útlendi │Tala við mig 23:27, 10 August 2016 (UTC)
PS: See [[Talk:sheng nu]] for the relevant discussion and timeline. ‑‑ Eiríkr Útlendi │Tala við mig 23:29, 10 August 2016 (UTC)
I agree with Korn in principle, but there are some practical issues that need to be dealt with. When someone makes an edit and an admin finds it, at this point the admin is still an uninvolved party, when the admin reverts the edit and the original editor reverts it back, does the admin now suddenly become an involved party and need to seek out another uninvolved admin? When the second admin reverts the edit and the original editor reverts it back, does the second admin now also become an involved party and need to seek out a third admin? It's difficult to know what the "right" thing to do is if you are an admin in that situation and we don't have any clear guidelines on this. I think we really need to draft up a policy on this, so that admins will have a procedure to follow and also so that we can clearly determine when an admin is not following it. --WikiTiki89 15:04, 10 August 2016 (UTC)
My simple proposal for every sort of edit war is that instead of a second undoing, a third person has to be contacted. That is: 1. An edit is made by John. 2. It is undone by Jim. (It is not relevant whether the undo-function or be rewriting of the contents.) 3. The original edit is restored by John. 4. Jim is not allowed to undo it a second time. Instead, Jim is now obliged to bring the discussion to the attention of other users of the language or the community in general (such as Beer Parlour). Korn [kʰũːɘ̃n] (talk) 15:45, 10 August 2016 (UTC)
ps.: Obviously, if John and Jim agree that they will debate this amongst themselves or that they are fine with continued editing as a form of successive proposals rather than merely trying to set the page back to a former status quo over and over, a third party need not be bothered with it. Korn [kʰũːɘ̃n] (talk) 15:48, 10 August 2016 (UTC)
My name is John and my brother's, Jim. I sincerely doubt whether we would have such a disagreement. :PJohnC5 15:54, 10 August 2016 (UTC)
  • I'd like to point out a concern here -- we have the Wiktionary editor community (not very big to start with), and we have the individual language editor communities (much smaller still). If our hypothetical Admin Jim is the only active editor for Language Foo during the time that non-admin Editor John is busy adding controversial content to entry Bar, do we now demand that Admin Jim just sit on his hands for possibly several days, or longer, until some other editor for Language Foo comes along? Again, serious question, not rhetorical. I'm interested in people's views here. ‑‑ Eiríkr Útlendi │Tala við mig 23:27, 10 August 2016 (UTC)
In response to Wikitiki89 (talkcontribs), no it seems to me that an uninvolved admin does not become involved by reverting the editor causing the controversial edit, including multiple times. In response to Eirikr (talkcontribs), no the admin shouldn't have to wait until someone else knowing that language comes along. Instead there should be a discussion in Tea Room or Beer Parlour or wherever. That way, others can weigh in based on the evidence. (In practice, in such a case the admin, esp. a long-time contributor to a language, will probably get the benefit of the doubt unless the other user can show a sufficiently good reason why the admin is wrong.) IMO in a controversy a long-time status quo should prevail until the controversy is resolved, and it's probably OK to lock a page on the status quo to prevent an edit war, *if* the user doing the controversial edit insists on edit-warring rather than participating in a discussion; I've seen that happen in Wikipedia. The locking should happen by an uninvolved admin, though, and only while the discussion is happening. Benwing2 (talk) 23:41, 10 August 2016 (UTC)

Proposed addition to WT:NORM: headers cannot be nested inside things[edit]

I propose adding an additional rule: Headers must not be nested inside other elements, such as templates and (HTML) tags.

This rule would make parsing a lot easier, because a parser would not need to parse the nesting of templates before they can determine whether a header is "real", appearing at page-level, or is actually nested within some template. A parser would be able to assume that every header is "real". With this change, the code


would be disallowed. —CodeCat 20:06, 10 August 2016 (UTC)

Do you have an example of a page that does this? DTLHS (talk) 20:12, 10 August 2016 (UTC)
Not really, there may well be none (there's some talk pages, but those don't count for WT:NORM). The point is to have an official rule in place that disallows it, so that a parser's design can be simplified. —CodeCat 20:14, 10 August 2016 (UTC)
Is there really a need to make a formal rule about this? I mean to me it just seems obvious not to screw around with the layout and such like that via a template. Although I do see some pretty crazy things done on wikis sometimes... Philmonte101 (talk) 21:38, 10 August 2016 (UTC)
@CodeCat: I think this is a problem with WT:Norm, Templates, 3: `For templates with many or long parameter values, line breaks are allowed at the end of a template's name or a parameter's value, for the purpose of making the wikitext easier to read.' If one changes `are allowed at' to 'are only allowed at,' I believe this would make your example impossible, because every line inside a template would have to begin with a pipe or a double closing curly brace. I assumed that only, although not stated, was probably intended by this line. Edit: For HTML, I agree and see something like this as important. Isomorphyc (talk) 02:57, 11 August 2016 (UTC)
I don't understand what you're saying. This is about nesting headers in templates and other things. —CodeCat 15:24, 11 August 2016 (UTC)
@CodeCat: As I understand, headers are required to begin immediately after a line break (I assume Headings 1. means `one blank line [immediately] before all headings,' except the first, as anything else is extremely rare and normally treated as an error. If templates which continue onto a second line are required to do so only after the value, then the line break will be immediately followed by a pipe, a closing double curly brace, or whitespace, never an equals sign. Hence, this change to the template rule will prevent embedding pseudo-headers into templates, which it probably was intended to do in the first place. HTML and other things are a different matter. WT:NORM is a little bit subtle in places; am I misunderstanding? Isomorphyc (talk) 16:44, 11 August 2016 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────@Isomorphyc: It's a minor loophole and unlikely to be successfully exploited, but under the current rules, we could wrap the whole entry inside a template. There is no blank line before the first heading. The example below assumes that a module can convert the "\n" into newlines.

{{example|==English==\n\n===Noun===\nblah blah\n\n# defs}}

Or we could have this:


Anyway, even if nobody would do one of those things, in my opinion it's a good idea to implement the rule that CodeCat proposed, either on its own or as a complement to the other rules. According to the introduction of WT:NORM (the entirety of the policy was voted and approved, including the introduction), people aren't even required to follow NORM and technically they would be able to go wild with whitespaces and line breaks (I hope common sense is required to some extent, even if not explicitly said so in the rules) so it's good to be extra clear about what we want.

I created the vote: Wiktionary:Votes/pl-2016-09/No headings nested inside templates or tags. --Daniel Carrero (talk) 03:53, 23 September 2016 (UTC)

FWIW, I have written lots of bot scripts that split pages into sections and subsections, and I realize now that I've implicitly assumed that headers never occur in templates. CodeCat is completely right that having to worry about this is a real pain in the ass. AFAIK I've never been bitten by any header inside of a template so it probably doesn't occur. Benwing2 (talk) 05:17, 23 September 2016 (UTC)
No objections, although, in about 9 years of editing here, I don't think I've ever seen it. Renard Migrant (talk) 17:28, 24 September 2016 (UTC)
I seem to remember some kind of Usage notes template that included the header in it, but it might have been subst-only. Chuck Entz (talk) 18:05, 24 September 2016 (UTC)
I am fine with this proposal, but you should still make your parser smart enough to know when it is inside a template and when it is outside a template. More problematic is the possible inclusion of things like headers via templates, not as template parameters. - TheDaveRoss 19:44, 24 September 2016 (UTC)

Proposed addition to WT:NORM: no template parameter expansions[edit]

This means that things like {{{1}}}, with three curly braces, can't appear in the wikitext. This is probably something that goes without saying, since regular pages aren't ever passed parameters. But to have it codified would again be a useful assumption for parsers: rather than having to decide whether a bunch of curly braces should be grouped two or three, it can assume it's always two. —CodeCat 20:11, 10 August 2016 (UTC)

I support this. Edit: Some rationale: I recognise it is not desirable to turn WT:NORM into a grammar, but I think just a few lines should make the extensions completely orthogonal to the Wikitext abstraction. Given the potential for adding logic into the triple brace notation, and the fact that a pushdown parser is required to fully treat this syntax, this exception is worth codifying. Isomorphyc (talk) 03:57, 11 August 2016 (UTC)
That sounds reasonable. - -sche (discuss) 09:32, 13 August 2016 (UTC)
I am fine with this. - TheDaveRoss 19:55, 24 September 2016 (UTC)

Wiktionary:About Scots[edit]

Who here contributes a lot with our Scots lexicon? I notice that words that mean the same thing in both English and Scots are never added by users other than me for some reason. I've added terms like electromagnetic, Denmark, and others, that have the same meaning as in English. I guess it's because people a lot of times will consider Scots a dialect rather than a language. I feel like we should have all terms in Scots, or else we should make a formal decision about which Scots terms should be allowed here and which shouldn't. Philmonte101 (talk) 21:36, 10 August 2016 (UTC)

@User:Angr, User:Nbarth, User:Leasnam, you three might be interested in this discussion. Philmonte101 (talk) 23:48, 10 August 2016 (UTC)
I agree that these words should be added, but like we saw with Middle English entries, it tends to fall by the wayside. It may seem like duplicate effort to try and get Scots words in when they're the same as the English word. Time to roll up our sleeves I guess Leasnam (talk) 04:48, 11 August 2016 (UTC)

I just had a few things pop into my head. You're right, for one, it seems really tedious to add all of those entries. But what if we could possibly get a list of these terms from somewhere, like, say, another dictionary dedicated to the Scots language? And then we could either add them manually or automatically create the entries somehow, like, say, a bot of some kind that takes information from the English entries and converts them into Scots (except no definition, just a simple one-word translation)?

@ User:Leasnam But then again, we'd have to worry about verification of these terms, as per WT:ATTEST. I wonder, is there an easier way to verify Scots terms? When I try to look for Scots terms in, say, Google Books, for "electromagnetic", all I find are English sources. Is there an online Scots library of some sort, or some way to search in only Scots language books/archived documents? Philmonte101 (talk) 05:09, 11 August 2016 (UTC)

Hrmm, there are several, but I am unsure if they are copyrighted or not. Most likely would be. I tried searching for "electromagnetic" and "maist"/"mair"/"ilka" and didn't turn up anything in Scots. Scots is mainly devoted to poetry and older language. Anything having to do with electromagnetism I think would be written in English (?) Leasnam (talk) 05:24, 11 August 2016 (UTC)
I'll be honest with you, I found a lot of those terms on Wikipedia, and searched them up to make sure they were used consistently on the site, and didn't seem to have any variations. So I assumed that because of this those terms may be attested. But I guess I was wrong about that. But wait, couldn't Scots be one of those languages that has a template that says it's scarcely documented, and so therefore the rules of CFI for it are different than for more common languages? I know I've seen Malagasy terms that had this template. Philmonte101 (talk) 16:47, 11 August 2016 (UTC)
Semi-relevant: one thing that annoys me is that when you create a Scots verb (sco-verb), it inflects in a certain way (can't remember exactly, but something like e.g. walkit instead of walked) that often doesn't reflect the actual literature. It is, of course, hard to draw a clear line between English and Scots words, given the history. Equinox 20:09, 11 August 2016 (UTC)

Telugu wikisaurus[edit]

I would like to create Wikisaurus for Telugu language. Where and How to do it. What is the platform; Is it English wiktionary or Telugu wiktionary. Thank you if someone answers.--Rajasekhar1961 (talk) 04:01, 11 August 2016 (UTC)

I would try Telegu Wiktionary. DCDuring TALK 10:45, 11 August 2016 (UTC)
Can someone help me in creating the Wikisaurus in Telugu Wiktionary. I need some technical assistance and create the necessary Wiktionary:Wikisaurus/Format and templates there. Thanking you.--Rajasekhar1961 (talk) 11:28, 11 August 2016 (UTC)
@Dan Polansky did the work on that here. DCDuring TALK 13:13, 11 August 2016 (UTC)

In French Wiktionary there is thesaurus in different languages, not only in French, with translation into French. Is it different in English Wiktionary? In which convention page is it specified? Noé (talk) 10:00, 18 August 2016 (UTC)

Usage note at schrift[edit]

I removed this usage note, considering it pointless, but User:Morgengave restored it and expanded it further. I don't think this is much of a usage note at all, since it doesn't say anything about the usage of the term (that the definitions don't already say), nor is it customary for us to mention other terms different by capitalisation in usage notes. We have this on Earth vs earth, but not Moon vs moon or most other cases where this happens. What do others think? —CodeCat 21:56, 11 August 2016 (UTC)

Hi CodeCat - My personal view is that if it can help the person consulting the lemma, then let us include it, on condition that it remains sharp and concise obviously. Overall, it would be helpful to have a Wiktionary:Usage notes policy to use as guidance. Morgengave (talk) 22:05, 11 August 2016 (UTC)
We already have {{also}} for this, which works fine for the vast majority of pages. So why do these few cases in particular require a usage note? Also, does your usage note even give any notes on usage? —CodeCat 22:08, 11 August 2016 (UTC)
I'm fairly relaxed about this - I am open to multiple solutions as long as the user is helped. I believe it helps more than the "also" on the top of the page as in my view the also-template is for users to find the right lemma back easily and the usage notes are to help the user on usage (including potentially confusing situations). Two different purposes in short. There's unfortunately no usage notes policy and this one sentence seems fairly harmless even if one would deem it redundant. Can you help me understand you better: why are you keen on removing this one sentence? Morgengave (talk) 22:27, 11 August 2016 (UTC)

Reference specifications[edit]

As per a debate I have had recently and in the past with @Dan Polansky, an example of which may be found at Template talk:R:DSMG, I think that references in templates or in entries should be explicit and full format. Pace Dan, whatever beauty may be derived from a “simple” format of templates such as {{R:DSMG}} and {{R:Webster 1913}} accompanies a loss of relevant citation information. We have templates created (and updated recent by @Smuconlaw), {{cite-book}}, {{cite-journal}}, etc., that provide standardized citation functionality. Indeed, I would be prepared to start a vote to make rules for citation generally. What do people think? Is this a minor quibble, or do people agree that we should have a standardized, full format? —JohnC5 17:26, 12 August 2016 (UTC)

I am fine with a full detail being available on a mouseover. Thereby, the detail would be there for those who require it, while it would not block the radar screen and disturb the skimming focus of those who love succinct identification. --Dan Polansky (talk) 17:28, 12 August 2016 (UTC)
By way of example: into {{R:is:IEO_1989}}, I placed the following two formats, the short one and the long one, the long one being visible upon moseover:
  • word in Hólmarsson et al.: Íslensk-ensk orðabók. 1989.
  • word in Sverrir Hólmarsson; Sanders, Christopher; Tucker, John • Íslensk-ensk orðabók / Concise Icelandic-English Dictionary • Reykjavík: Iðunn, 1989
--Dan Polansky (talk) 17:31, 12 August 2016 (UTC)
I agree that excessive details can be hidden using a mouseover or expandable box. (I would like this for our quotation templates as well). DTLHS (talk) 17:33, 12 August 2016 (UTC)
If this is to be done it should be done consistently across all templates with regards to what information is hidden. "Box" wasn't the right word, just an expandable section. I prefer that to a mouseover because with a mouseover you can't have any links and you can't select and copy the data. DTLHS (talk) 17:43, 12 August 2016 (UTC)
What should be done for platforms that have no mouse pointer? —CodeCat 17:46, 12 August 2016 (UTC)
An alternative I have in mind is that each reference template would contain a link to a section in an appendix page for reference templates. That section would contain a full identification and more. As for technology, it is a simple wikilink. --Dan Polansky (talk) 17:52, 12 August 2016 (UTC)
The proposal here would be to have giant appendices containing full versions of every citation we use or choose to abbreviate? —JohnC5 18:12, 12 August 2016 (UTC)
You mean every reference, right? I hope you do not intend to push your ornamental cast iron to our poor attesting quotations; they are already too noisy, putting metadata before the quotation itself. As to the substance of your question, the appendices obviously do not need to be "giant"; they can be as granular as we see fit, and therefore as small as we see fit. --Dan Polansky (talk) 18:21, 12 August 2016 (UTC)
Don't worry, none of this will ever happen since there are a million different reference formats none controlled by the same back end, making any kind of unification impossible. DTLHS (talk) 18:42, 12 August 2016 (UTC)
Oh ye of little faith! We certainly can make a standard then fix all the templates. —JohnC5 19:36, 12 August 2016 (UTC)

Proposed extension to criteria for inclusion on proper names of fictional works.[edit]

I'd like to propose a change (and if these become votes, they'd be separate votes from one another) to our criteria for inclusion.


  • Proper names of titles of fictional works, such as books, television series, video games and video game series, should be included in our lexicon, as long as they have 3 sources that are independent from the book/series itself. I.e. the book citations, or Usenet citations, must not specifically be about the television series or video game.
  • Please note that this proposal is not about appending fictional characters or names of fictional entities into Wiktionary; just about the titles of the works themselves (usually represented by italics). I feel that characters and entities should follow the guidelines that are here already.


  • Just like countries, cities, county names, etc., titles of these works are proper nouns.
  • Many will argue that including such things "is not traditional." Though it is not traditional, I'm surprised that we didn't include these already. For example, with Wikipedia, traditional paper encyclopedias don't generally include articles about television series or cartoon characters. Well, they might have a few of the really important ones, but not many. So, Wikipedia is thus extremely different from the traditional encyclopedias in many ways, and is in fact better if you ask me. I'd say the same thing about Wiktionary. Wiktionary includes far more information than most paper dictionaries do. Many dictionaries don't include nearly the amount of etymological information, synonym information, derived terms, anagrams, pronunciation, etc., that we do here. Also, they generally don't include rare slang terms. We do. Most paper dictionaries wouldn't include "all words in all languages", because, well, it'd be silly; millions + pages. Most paper dictionaries don't include individual entries for inflected forms. And now add this; paper dictionaries generally don't include names of popular TV series, or classic works of literature, etc. But, it would be informative to readers, so why don't we?
  • Many TV series have a few translations in other languages. Such as, The Simpsons sometimes translates to Los Simpson in Spanish. The TV series Cops apparently translates to Zsaruk in Hungarian. I could find more examples, but you get my point. Many people might want to know the translations of these proper nouns. Of course, the translations as well as the English entries would have to be verified as per the changed CFI.


TV series[edit]


  • [4] "The network felt that Duckman, the Emmy award-winning, but low-rated series about an acerbic, chauvinistic detective and his bumbling family, did not reflect the general-entertainment brand model that USA was trying to build in prime time [...]" 2013
  • [5] "This may be the show that proves TV animation can stay up past most kids' bedtimes and stiii And a strong, profitable audience. "Duckman," a new latenight series on USA Network, is crude, violent, cynical, antisocial and a little sexist, pretty [...]" 1996
  • [6] Usenet. Just scroll all the way down until you find ones that don't have "Duckman" in the title.
Classic book[edit]

Winesburg, Ohio

  • [7] Groups, from 2010.
  • [8] "As I begin to reevaluate the place of Sherwood Anderson's Winesburg, Ohio in the development of American fiction, I first want to look at Anderson's symbiotic relationship with Gertrude Stein, a relationship most Stein devotees will know about [...]" 1999
  • [9] "In triggering conversation in Winesburg, Ohio, however, a single word can dramatically alienate and isolate dialogic partners in the frightening immediacy of their encounter; such contact is always a "traumatism of astonishment." 2009

Separate proposal[edit]

If this doesn't work, we may be able to have these proper names of fictional or nonfictional works somewhere in the appendix namespace.


I just threw this together in about 30-45 minutes. But you get my point, I'm sure. You can find sources that aren't directly about these proper nouns, that are from varying years, and that were not written by its creator. If these were the inclusion standards for entries for book names or TV show names, we should also have a header that italicizes the proper noun, as this is the standard in English.

Comments below[edit]

So, what do you think? I feel the urge to start a vote, and it says to start discussion in the beer parlour. I know quite a few of you will most definitely and immediately disagree, and I have a semi-good idea of which users will and won't (whom I know). Although, I'm sure there will at least be some who agree or at least partially agree with this proposal, and I'm open to suggestions to things I should change before starting the vote. (No personal attacks please) Philmonte101 (talk) 05:16, 13 August 2016 (UTC)

  • Oppose. I hope other editors will articulate the reasons; I think it's kind of obvious. --Dan Polansky (talk) 07:49, 13 August 2016 (UTC)
  • Oppose We have enough work to do just filling out, cleaning up, and otherwise maintaining what we've got. DCDuring TALK 12:47, 13 August 2016 (UTC)
  • Oppose. This type of information is better suited to an encyclopedia. If only there were an encyclopedia version of Wiktionary... --WikiTiki89 13:27, 13 August 2016 (UTC)
    Precisely. Oppose. Equinox 13:28, 13 August 2016 (UTC)
  • Oppose for the reason given by Wikitiki89. If a fictional title or character has gained some idiomatic meaning, then it merits inclusion here (e.g., Wonder Woman to mean a woman of extraordinary ability). If it only retains its fictional meaning, then it belongs at Wikipedia. — SMUconlaw (talk) 11:33, 15 August 2016 (UTC)
  • Oppose. We might as well include the names of specific people, like Justin Trudeau, so that people who want to know why his parents chose that name for him can look it up under the etymology. Andrew Sheedy (talk) 17:22, 15 August 2016 (UTC)

oversized Cyrillic for Old Church Slavonic and Old East Slavic[edit]

For some reason, the Cyrillic font we use for Old Church Slavonic and Old East Slavic renders bigger than the Cyrillic font for Russian, at least on my Mac OS X laptop under Chrome. See тать for an example; compare the Old Church Slavonic entry to the Russian entry, and see the Russian etymology for an example of Old East Slavic, which looks (on my machine) the same as Old Church Slavonic. Do we want to fix this? Benwing2 (talk) 21:15, 13 August 2016 (UTC)

The reason for this is that the specific Old Cyrillic fonts come out smaller and therefore need to be rendered bigger. Your Mac is probably using Helvetica or whatever the default Mac font is, because it supports the characters and because you don't have Old Cyrillic fonts installed. Ideally, we should be be able to provide font-specific sizes, but I don't think CSS supports that. --WikiTiki89 21:32, 13 August 2016 (UTC)
If you are lack of fonts, these fonts may help you. [10] (In this case, try Noto Sans or Noto Serif.) --Octahedron80 (talk) 12:57, 18 August 2016 (UTC)
The issue is not character support in the fonts, but rather the choice of font. The Old Cyrillic script is meant to be displayed like this. --WikiTiki89 13:18, 18 August 2016 (UTC)
I'll pass if it depends on font variations, since they are located on the same codepoint. --Octahedron80 (talk) 00:38, 19 August 2016 (UTC)

"book cites aren't usexes"[edit]

In diff user:Equinox removed the {{ux}} template. It's good and well if we decide that this template is strictly for usexes (which is far from decided as far as I know, but never mind), but the template should not be removed altogether. Instead an alternative template should be provided that is more appropriate. —CodeCat 20:26, 14 August 2016 (UTC)

At the moment, ux puts the text in italics, which doesn't look good for book citations. Equinox 20:27, 14 August 2016 (UTC)
{{ux}} is wrong if we wish to maintain the customary italicization of book/journal/newpaper titles. I can't understand how the failure to explicitly exclude {{ux}} from use for citations constitutes sanction in favor of it. One could as easily claim that wikitext can overwrite all templates not explicitly endorsed by a voted policy. This kind of thinking is dangerous in an admin. DCDuring TALK 21:46, 14 August 2016 (UTC)
Maybe we should have a template that's identical to {{ux}} except it doesn't italicize, for use with quotations. People seem to like to use it in this way. DTLHS (talk) 22:19, 14 August 2016 (UTC)
Maybe we should keep {{ux}} for both usexes and quotes and maybe change it not to use italics for Latin chars. Note that it doesn't currently use italics in Cyrillic. Benwing2 (talk) 22:20, 14 August 2016 (UTC)
It's more or less like {{l}} versus {{m}}. —CodeCat 22:26, 14 August 2016 (UTC)
What is the advantage in using {{ux}} in terms of improved user experience, improved ease of adding content, speed of downloading, server load, etc.? Why aren't we hearing about such advantages? This also seems to go against the idea of intuitive names to speed the learning by new contributors. If that isn't important, why not remain {{ux}} to {{u}} for "qUotation" and "Usage example? DCDuring TALK 22:38, 14 August 2016 (UTC)
For English, I don't know. For foreign languages it provides uniform formatting of translations and (for non-Latin scripts) transliterations. Benwing2 (talk) 22:57, 14 August 2016 (UTC)
Because the more consistent we make our entries the easier they are to edit. And the last 10 years have proven that we are utterly incapable of any consistency that isn't rigidly enforced by templates. DTLHS (talk) 22:59, 14 August 2016 (UTC)
For quotations, we have a whole series of templates (the most important being {{quote-book}}, {{quote-journal}} and {{quote-web}}) that can be used for a consistent appearance. I don't think any other templates are required. — SMUconlaw (talk) 11:29, 15 August 2016 (UTC)
  1. . How does {{ux}} do a better job of making our entries easier to edit? It looks like just a labeling requirement imposed on others to make life easier for amateur programmers.
  2. . The "quote-" family of templates does make for a great deal of uniformity in line and character formatting and in order of the components of citations. What does {{ux}} add? If the idea is that the advantage will emerge in the fullness of time, we would need to have a great deal more faith in the capability of our "technical" contributors than I believe they have earned. DCDuring TALK 13:31, 15 August 2016 (UTC)
{{ux}} provides automatic transliteration, whereas {{quote-book}} et al. do not. A deal breaker for me. --Vahag (talk) 14:03, 15 August 2016 (UTC)
I didn't know that {{ux}} provides automatic transliteration. But in that case, the solution is for someone knowledgeable about Lua to add automatic transliteration to {{quote-meta}}. (The "quote-" family of templates already has a |transliteration= parameter.) Using {{ux}} in this context is not very appropriate because it formats quotations differently from the "quote-" templates, leading to a lack consistent appearance. — SMUconlaw (talk) 15:34, 15 August 2016 (UTC)
But why mix two different functions (citations and text rendering) in one template, when it would be easier to just use two templates? --WikiTiki89 17:06, 15 August 2016 (UTC)
Since the "quote-" templates are already intended for formatting quotations, why not just build the automatic transliteration function into {{quote-meta}} instead of having to use yet another new template? — SMUconlaw (talk) 17:20, 15 August 2016 (UTC)
Transliteration is not the only thing missing. {{ux}} (and I guess now {{quote}}) support many features that are useful for rendering text, such as allowing language-links, and may support more features in the future, such as automatic linking and who knows what else. It wouldn't make sense to add each of these features in more than one place, when they can just be added to one place. Let the "quote-" templates focus on formatting the citation line itself and not worry about rendering quotation text. --WikiTiki89 17:26, 15 August 2016 (UTC)
I see. But the "quote-" templates also render the quotation text through the |passage= parameter. The current situation means that two separate templates have to be used for formatting quotations depending on what features are required in the quotation text. I hope this is explained somewhere (perhaps at "Wiktionary:Quotations"). — SMUconlaw (talk) 17:45, 15 August 2016 (UTC)
I don't see any inherent problem with using two different templates. In fact I even find that it makes the wikitext more readable. --WikiTiki89 17:52, 15 August 2016 (UTC)
I think {{quote-meta}} should be using the usex module to render the quotation text. DTLHS (talk) 17:47, 15 August 2016 (UTC)
You could do that, but you would have to ensure that there are no argument naming conflicts and such. I don't think there are any at the moment, but it would be an extra thing to worry about whenever adding a new argument to either {{ux}} or any of the individual "quote-" templates. I don't see the point. --WikiTiki89 17:52, 15 August 2016 (UTC)
I have created {{quote}}, which works the same as {{ux}} except for these differences in formatting. And it does do automatic transliteration. —CodeCat 15:44, 15 August 2016 (UTC)
What exactly are the differences in formatting? I tried it at קטון, but the formatting is exactly the same. Does this only apply to Latin script? --WikiTiki89 17:06, 15 August 2016 (UTC)
CodeCat answered this question at User talk:KIeio‎#Template:quote, saying: “It doesn't apply any formatting to the quoted text, so that it preserves its original formatting as much as possible.” @CodeCat: I presume you meant basically that italics are not applied? Is there any other difference? And can we get the following to work as expected: {{quote|ru|Слова ''да'' и ''нет''.}} (currently the italics don't do anything in scripts other than Latin)? --WikiTiki89 19:52, 15 August 2016 (UTC)
This can't be fixed unless we allow for Cyrillic italics in general. Previous discussions have mostly led to the conclusion not to allow them. —CodeCat 21:08, 15 August 2016 (UTC)
This can be fixed without allowing non-Latin scripts to be italicized in mentions and usexes. Previous discussions have led only to the conclusion not to allow italicizing non-Latin mentions and usexes, but that does not apply to quotations. --WikiTiki89 21:55, 15 August 2016 (UTC)
True, this would work thanks to {{mention}} having distinct style tags (finally, a good use for it). I just wanted to make sure that it was ok to remove the blocking of all Cyrillic italics, which I believe we have currently. —CodeCat 22:19, 15 August 2016 (UTC)
Is there a reason why {{ux}} no longer seems to italicize example sentences? It's introduced a whole bunch of inconsistencies (compare, for example, the verb and noun sections at shift, one of which was italicized manually; the other with the template). Also, now that the example sentences are no longer visually distinguished from the definitions, it is much harder to read. I'm guessing this discussion has something to do with the change, but could it be reverted? It's just undone possibly hundreds of my edits which were aimed at increasing consistency between entries. Andrew Sheedy (talk) 20:02, 15 August 2016 (UTC)
Module:usex if quote and (sc:getCode() == "Latn" or lang:getCode() == "und") then. @CodeCat Shouldn't it just be if (sc:getCode() == "Latn" or lang:getCode() == "und") then? DTLHS (talk) 20:09, 15 August 2016 (UTC)
That seems to have been a mistake. I fixed it. --WikiTiki89 20:11, 15 August 2016 (UTC)
Perfect, thanks. I'm glad to know it was an accident rather than another template change I have to adjust to.... Andrew Sheedy (talk) 20:17, 15 August 2016 (UTC)
It's pretty standard actually not to use a template for a citation. Not sure why. Perhaps no template has ever exceeded simply doing it by hand. Renard Migrant (talk) 20:27, 15 August 2016 (UTC)
It should never have been standard. DTLHS (talk) 20:31, 15 August 2016 (UTC)
The quotation templates have been greatly improved over the past few months. --WikiTiki89 20:37, 15 August 2016 (UTC)
Face-smile.svgSMUconlaw (talk) 09:10, 16 August 2016 (UTC)
@Smuconlaw: I don't know if you've gotten one, but you deserve a very big thanks for that! It was something I always thought about doing but the templates were such a mess I was too afraid to go near them. --WikiTiki89 14:58, 16 August 2016 (UTC)
Awww, shucks! No problem, it was quite interesting working on those templates. — SMUconlaw (talk) 15:42, 16 August 2016 (UTC)

Templates in Category:Quotation reference templates should use {{quote-book}} et al when possible[edit]

I'm hoping there's agreement on this. Some of the templates have extra parameters that may not fit elegantly. It will be some work but there aren't too many templates to convert. DTLHS (talk) 21:27, 15 August 2016 (UTC)

Is this applicable to reference websites? DCDuring TALK 21:56, 15 August 2016 (UTC)
What do you mean? DTLHS (talk) 21:57, 15 August 2016 (UTC)
  • Support: I'm in favour as this would standardize the formatting of quotations. — SMUconlaw (talk) 09:10, 16 August 2016 (UTC)

Extended flexibility vote[edit]

FYI, I extended Wiktionary:Votes/pl-2016-07/Editing "Flexibility" by 1 month per request. --Daniel Carrero (talk) 00:44, 16 August 2016 (UTC)

I request that it be closed as per its original creation page. DCDuring TALK 02:30, 16 August 2016 (UTC)
Three people, including myself, supported the extension in the #Decision section in the vote. --Daniel Carrero (talk) 02:46, 17 August 2016 (UTC)

Russian combining forms like -бавить or -ключить[edit]

I created a number of entries visible in CAT:Russian verbal combining forms. These are verbs where the base verb is missing but various prefixed derived verbs exist, and I want to create an entry for the base verb for use in etymologies and such. CodeCat (talkcontribs) didn't like the term "combining forms". What do others think? Benwing2 (talk) 09:13, 16 August 2016 (UTC)

I have no strong opinion on this. The categories are useful but I haven't seen similar examples for naming them. --Anatoli T. (обсудить/вклад) 10:25, 16 August 2016 (UTC)
For the L3 header, I'd just call it a verb. We don't seem to have entries for parallel things in English like -ceive, but I did make an entry for Old Irish ·icc and call it a verb. —Aɴɢʀ (talk) 14:36, 16 August 2016 (UTC)
What would make the most sense would be to call them reconstructions: *бавить ‎(*bavitʹ), *ключить ‎(*ključitʹ). But for some reason I don't like that idea. I also don't like the hyphens in -бавить ‎(-bavitʹ) and -ключить ‎(-ključitʹ). I would say we should do what Angr did with ·icc and put them at бавить ‎(bavitʹ) and ключить ‎(ključitʹ). --WikiTiki89 15:01, 16 August 2016 (UTC)
Would they survive an RFV? —CodeCat 15:31, 16 August 2016 (UTC)
It would be an RFD question, because the claim would that they are attested as part of their derivations. --WikiTiki89 15:49, 16 August 2016 (UTC)
That would make them like ceive or ject, which I doubt would survive. RFV demands attestation of the lemma itself, it doesn't allow for such exceptions as far as I know. —CodeCat 16:05, 16 August 2016 (UTC)
Adding a hyphen onto them doesn't suddenly make them any more or less attestable than they were before. This issue is about what the entry name should be and not about attestation. For some languages, like Arabic, we don't indicate prefixes and suffixes with any sort of hyphen in entry titles. For Sanskrit, all our noun lemmas are actually suffixless stems that don't really exist on their own. This isn't much different. --WikiTiki89 16:27, 16 August 2016 (UTC)
-ceive and -ject aren't quite parallel to -бавить and -ключить because (among other reasons) all words containing the former morphemes were borrowed from French and Latin with those morphemes in them, similar to Russian -инг. And we do have entries for things like Latin -bulum that can't be attested on their own so I don't see how the RFV issue applies. Benwing2 (talk) 00:10, 17 August 2016 (UTC)
What about the case of verdwijnen? There is no verb dwijnen, at least not in current Dutch. The point is that we have essentially an unattested verb that it might be desirable to have an entry for. In Latin, we have opted to go for a reconstructed entry for the unattested base verb, as linked in the etymology of abdō, ēdō, dīdō and others. —CodeCat 00:22, 17 August 2016 (UTC)

User:Wyang is edit warring again[edit]

I have tried to explain the situation to him on his talk page, but he doesn't seem to want to understand that he can't just change common practice regarding transliteration to suit his own personal tastes. Big changes to common established practice like this need discussion and consensus, and I consider this a big enough change to require a vote, but I am having a hard time getting him to actually do so and wait for consensus. Instead he edit wars over it to try and force his change through, since he thinks he is right, anything is warranted and any opposition is apparently shortsighted and Eurocentric and therefore it's ok to ignore consensus. Can someone else please try explaining it to him and try to get him to stop messing with the modules? The only thing I can do is continue to revert him. Thank you. —CodeCat 02:01, 17 August 2016 (UTC)

It has been very frustrating interacting with User:CodeCat - unreceptiveness to suggestion, poor participation in discussions at the Beer Parlour, blocking wilfully, replying with completely irrelevant comments, and impetuous reverts without any input to the topic at hand. The word being thrown around is consensus, when there is not even one to begin with. I repeatedly asked for consensus for treating romanisation and transliteration as equivalent in Module:links, but User:CodeCat's response is plain simple - evasion, evasion and evasion. Without any clear and thorough discussion showing your edit is consensus, why are you throwing around the consensus as if there is one? If you are not willing to discuss, you should not be making any changes, let alone reverting impetuously. Disappointing that such blatant bullying is condoned. Wyang (talk) 02:11, 17 August 2016 (UTC)

Proposal: "Description" section for symbols[edit]

I've been using the Etymology section to place descriptions like these for some symbols.

Proposal: I'd like to use a "Description" section instead.


  • These are descriptions, not etymologies.
  • Maybe this would discourage definitions that are merely the Unicode description of the symbol, which would be a good thing.

Template:editnotice-exotic symbols says: "When creating this entry please make sure you give the symbol a proper definition, preferably with attestation. Mere Unicode code point name does not constitute a definition. Symbol entries without proper definitions may be deleted." Related discussion: Wiktionary:Beer parlour/2015/January#Is documenting all Unicode characters within the scope of Wiktionary?.

If someone creates an entry like that, (using the code point name as the definition, I mean) I was hoping we would be able to say: "The definition is not the place to describe the symbol, use the Description section instead. The definitions are for real meanings that can be attested."


As I said, my idea is to use the Description section for symbols like 💾 and the others above, but if we agree about allowing the section for some symbols, it raises the question of whether common letters, numbers, punctuation, etc. as well should have a Description, too. I'm not sure about whether they should. I'm leaning towards allowing it, but I'd like to know what others have to say. I thought of a few examples for consideration:

  • A = "An upside-down V (two symmetrially opposed diagonal lines meeting at the top-middle point) with a horizontal line in the middle, from one diagonal line to the other. (also mention about the appearance of "A" in handwriting)"
  • ! = "A dot below a vertical line."
  • + = "A vertical and a horizontal line, crossing in the middle."
  • ¨ = "Two horizontally-aligned dots, to be placed above a letter."

I think there may be reasons not to want a "Description" section for all characters of all scripts. Han compounds like "秋 = compound of 禾 ‎+ 火" are real etymologies and descriptions too. Correct me if I'm wrong, but for Han compounds, I believe Etymology is enough and they don't need a "Description" section. Other scripts might have other considerations. --Daniel Carrero (talk) 03:02, 17 August 2016 (UTC)

  • Agree I think that describing symbols and how they may appear "in the wild" with actual usage is a valuable resource. I don't know that we need to describe "A"--if you can read English, you already will recognize this character. —Justin (koavf)TCM 05:44, 17 August 2016 (UTC)
  • Unsure. This would add a special case/section just for unicode characters and might be confusing. The entry layout is already complicated enough. I suspect that "Description" might get used outside of the unicode/symbol context. – Jberkel (talk) 11:19, 17 August 2016 (UTC)
"An upside-down V" would be a terrible way to describe something that wasn't actually derived from the letter V. Very misleading. Equinox 15:29, 17 August 2016 (UTC)
Fair enough. --Daniel Carrero (talk) 22:28, 19 August 2016 (UTC)
IMO "a V-shape" or "a V-like shape" is fine, though. I think "an upside-down V-shape" or "the shape of an upside-down V" would probably be OK. - -sche (discuss) 01:10, 20 August 2016 (UTC)

Is it better to put in usage note instead? --Octahedron80 (talk) 06:11, 18 August 2016 (UTC)

Maybe not, it would sound odd to me. Concerning "⚤", the text "Interlocked male and female symbol." is not how to use the symbol. It is how to draw the symbol, or what to expect in Unicode fonts.
I thought of having a separate Description section, also because it is a repeated, specific type of information that many symbols would have. The Usage notes section is for miscellaneous usage information. --Daniel Carrero (talk) 18:33, 20 August 2016 (UTC)
Maybe the Description section would be useful for someone to know what a certain symbol looks like, without installing the right Unicode font. I'm also willing to consider the hypothesis that a textual description of a symbol or letter would be useful to blind people. Maybe also for creators of fonts, I don't know.
I seem to remember that a certain Unicode character was sometimes depicted as a cross and sometimes depicted as a full church. To me, this sounds like something we should mention somewhere. --Daniel Carrero (talk) 11:36, 21 August 2016 (UTC)

I created Wiktionary:Votes/2016-08/Description. --Daniel Carrero (talk) 16:08, 22 August 2016 (UTC)

I worked closely with a blind person for two years. She used Braille (on paper) and screen-reader software (on the computer) and she had no reason whatsoever to know or care about the shapes of letters, let alone obscure mathematical symbols. Let's not create worthless rubbish for no reason please. Equinox 17:17, 22 August 2016 (UTC)
Thanks for the info. I removed "A separate hypothesis, although unproved, is that it would be useful for blind people to know what the symbol looks like, too." from the vote rationale. --Daniel Carrero (talk) 17:54, 22 August 2016 (UTC)

Image availability[edit]

Are some images not available for use in Wiktionary? I tried using this one, which is used in Wikipedia, but couldn't get it to work [11]. DonnanZ (talk) 09:34, 17 August 2016 (UTC)

It's not hosted at the Wikimedia Commons. (Note the lack of "View on Commons" tab at the top of the page) —suzukaze (tc) 09:39, 17 August 2016 (UTC)
That's a shame, it's a great image. Is it possible to rectify that? DonnanZ (talk) 09:51, 17 August 2016 (UTC)
It seems like it's been a candidate to be copied to the Commons since February 2012 (see the Licensing section, which also includes detailed information on the moving process). —suzukaze (tc) 09:56, 17 August 2016 (UTC)
I'm not skilled in doing that. I wouldn't like to try! DonnanZ (talk) 10:05, 17 August 2016 (UTC)
@Donnanz: There is a guide at w:Wikipedia:Moving files to Commons. You already have an account at Commons just by virtue of having one here, so you don't need to do anything new. If this guide is too confusing, let me know by typing {{Ping|Koavf}} and respond here. Thanks for being so eager to learn and help us! —Justin (koavf)TCM 14:04, 17 August 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I transferred it. It's now called "File:London United Tramways tram in front of its tram-shed, Kew Road, Richmond, UK - c 1900.jpg". — SMUconlaw (talk) 15:24, 17 August 2016 (UTC)

@Smuconlaw: Ah, wonderful. Thanks a lot (and also to Koavf). The result can be seen at tramshed. DonnanZ (talk) 18:10, 17 August 2016 (UTC)

Quotation questions[edit]

I recently added a quotation to mouth-breather, Gamergater, alt-right, and bluestockings. The quotation captures a number of colorful words, so I added it to several entries. The quotation was origninally:

  • 2016, Ross Douthat, "A Playboy for President," The New York Times, 14 Aug.
    "But the cultural conflict between these two post-revolutionary styles — between frat guys and feminist bluestockings, Gamergaters and the diversity police, alt-right provocateurs and 'woke' dudebros, the mouthbreathers who poured hate on the all-female 'Ghostbusters' and the tastemakers who pretended it was good — is likely here to stay."

Note two things:

  1. The external hyperlinks are in the original article on the NYT website.
  2. My addition was to what I perceived to be the "main" page for mouth-breather, and intentionally not to mouthbreather or mouthbreathers. I viewed the latter two as secondary because they identified themselves as an alternate spelling and a plural, respectively, had no definition or quotations, and directed the reader to the "main" page.

Equinox and I discussed several things, and it was proposed I bring them here:

  1. Is it appropriate to preserve hyperlinks at all? (I don't believe search engines are affected by such relinks in a wiki article.)
  2. If so, should it only be on the entries where they satisfy two criteria: they are in the original source and they clarify the author's meaning of the entry in question. Here that would mean they would stay for Gamergater and alt-right, but not for mouthbreather or bluestocking.
  3. What page should quotations appear on when alternate spellings are in play?
  4. If the alternate spelling has no entry, should it be created and the quotation moved?

--Flex (talk) 21:54, 17 August 2016 (UTC)

1. You should be using {{quote-news}} with the url parameter. 2. I believe quotations should go with the exact spelling of the word, unless the word is exceptionally hard to attest. DTLHS (talk) 22:04, 17 August 2016 (UTC)
One distinction: regarding the word-form where citations go: while I believe in using the exact form (i.e. mouthbreather rather than mouth breather, if there is no space in the citation — since citations need to attest and provide evidence for a specific form existing), I don't think that has to apply to grammatical inflections. In other words, I think it's okay to put a mouthbreathers (plural) citation at mouthbreather, but not at mouth breather. Equinox 22:15, 17 August 2016 (UTC)
Regarding the preservation of hyperlinks in cited text, my other points when talking with Flex were (i) they are something like formatting (e.g. font colour) and not really vital to showing attestation of the word, (ii) our linking might have annoying semantic repercussions (e.g. Google strengthening its PageRank between the linked text and its target, whereas we don't want to reinforce the writer's opinions — though I believe there are technical ways around this, e.g. nofollow), and (iii) Web links die very frequently anyway, and we often have to remove them from entries (plus dead links often benefit subsequent cybersquatters). Equinox 22:22, 17 August 2016 (UTC)
My responses:
  1. It is a common practice for websites to add hyperlinks to previous articles on their own website and, perhaps less frequently, links to external websites. I'm leaning towards the omission of such links in quotations, particularly since those links may well go dead if not archived in some way. (It would theoretically be possible in some instances to archive them at Archive.org, but indicating the archive URLs would be cumbersome. Consider the following: "But the cultural conflict between these two post-revolutionary styles — between frat guys and feminist bluestockings, Gamergaters [archived from the original on 9 August 2016] and the diversity police, alt-right provocateurs [archived from the original on 15 August 2016] and 'woke' dudebros, [] ")
  2. The quotation should appear under the main lemma, not the variant spelling. (This is the practice adopted by the OED.) It would be easier to gauge the vintage of a word that way, as all the quotations would appear on one page. Thus, an 18th-century quotation containing the obsolete form stowadore should appear together with more modern quotations from the 19th to 21st centuries where the lemma is spelled stevedore.
SMUconlaw (talk) 22:24, 17 August 2016 (UTC)
(It seems like Wikimedia wikis have nofollow on by default for external links. —suzukaze (tc) 22:24, 17 August 2016 (UTC))
Smuconlaw: I know it's OED practice, but we need a better rationale than "some experts do it that way". We don't know how they store their data internally; we only see the finished product, the dictionary. If we put all cites at the main form then it becomes very hard to see whether alt forms are supported or not. Equinox 22:27, 17 August 2016 (UTC)
[Edit conflict. I added a reason before seeing your latest comment. — SMUconlaw (talk) 22:29, 17 August 2016 (UTC)]
I think this is more of a user-interface issue. Wouldn't you agree that it makes sense to store citations with their exact attested form internally, but to show them all together to users who view an entry? (This would need further development work.) Equinox 22:32, 17 August 2016 (UTC)
Yes, if this can be technically achieved I see no issue with that. — SMUconlaw (talk) 10:31, 18 August 2016 (UTC)
We could use section transclusion to maintain the quotation in one place and include it in multiple other places. - TheDaveRoss 20:28, 15 September 2016 (UTC)
I think quotations should all go at the main form, but citatons should be on the citations page of the exact form they exemplify. Andrew Sheedy (talk) 18:39, 22 August 2016 (UTC)
What if the main form has more than one sense? Should all the citations for the alternative form be grouped together even if they have difference senses? DTLHS (talk) 22:40, 17 August 2016 (UTC)
I think it's logical (and our current practice, as far as I can tell) for quotations to be grouped by sense. — SMUconlaw (talk) 10:31, 18 August 2016 (UTC)
I think we should remove the hyperlinks, because what we are actually quoting is the durably archived version of the article, i.e. the version printed on paper (that will eventually end up on microfilm in libraries), and that version doesn't have hyperlinks. —Aɴɢʀ (talk) 19:04, 22 August 2016 (UTC)
I disagree. I think we're archiving the text the author actually published. In the case of a purely electronic medium, the links can be very relevant to the author's intent (whom does he consider an exemplar of "alt-right provocateurs"? Sarah Palin? Donald Trump? Richard John Newhouse? Ann Coulter? Specifically, he cites Milo Yiannopoulos.) The links in question are to reputable sources (Time and the NYT) which are less likely to expire than most.
Counter argument to myself: the print edition of this article did not have the hyperlinks, so at least in this case, it seems legitimate to remove them. That doesn't answer the general question of whether they should be included when possible. --Flex (talk) 21:17, 23 August 2016 (UTC)
Yes to linking We definitely should include links--even if they expire sometime, we still have the original citation. In fact, we should include more links to Internet Archive and WebCite as archive links. —Justin (koavf)TCM 21:54, 23 August 2016 (UTC)

I'm having trouble detecting consensus here. What should I do on the two points in question? If this is not the place to look for consensus, where should I go? --Flex (talk) 21:17, 23 August 2016 (UTC)

wheel warring between User:Wyang and User:CodeCat -- not cool[edit]

Something has to be done here. I'm not following this issue closely but I did notice that Wyang blocked CodeCat for 1 day for edit warring, when (a) almost certainly Wyang was equally guilty, (b) it is absolutely not OK for an involved admin to block someone they're involved in a dispute with, esp. another admin. I get the feeling both are equally guilty and deserve to be blocked. Wikitiki actually did block Wyang, who somehow managed to unblock himself (??), something else that's definitely not OK. My first instinct is to block Wyang again for his bad behavior, but instead I'm just going to unblock CodeCat since this particular block should not have been put in in the first place. What do others think? Benwing2 (talk) 02:06, 18 August 2016 (UTC)

The main reason this has escalated to this point is because Wyang has shown no willingness to find a consensus for his proposed changes to Wiktionary practice, and my previous call for help on the matter was completely ignored. Since Wyang is also an admin, my ability to enforce rules and common practice are limited and reverting the contested changes while trying to reason with him is all I can do. Please advise what can be done in the future in dealing with a rogue administrator without making myself a guilty party. —CodeCat 02:46, 18 August 2016 (UTC)
To answer your "??": admins have permission to unblock themselves. Obviously if it gets to that point they should hopefully be trying to generate some consensus with the blocking admin or the community. Equinox 03:44, 18 August 2016 (UTC)

Wheel War- Action Taken[edit]

The conflict between User:CodeCat and User:Wyang has gone on long enough. They've been edit warring over an absolutely critical module used by huge numbers of entries. I'm not sure what that's doing to the edit queue- but it can't be good.

Both deserve to be blocked, but that would render them unable to contribute in discussions over the issue. It's also true that their misbehavior has been limited to editing protected modules and blocking each other.

Therefore, I have temporarily desysopped both of them, which will prevent them from editing the modules in question. I intend to restore them in one week, or when this is resolved- whichever comes first.

If edits need to be made to protected modules before then, I would appreciate it if our more-knowledgable admins would make themselves available to help out- perhaps User:Wikitiki89 or User:DTLHS?

I hope we can resolve this conflict quickly and get back to building a dictionary.

I would appreciate your feedback on my actions, since such things should only be done with community consensus.

Thanks! Chuck Entz (talk) 05:49, 18 August 2016 (UTC)

  • I can't think of any other action that would have been more appropriate. SemperBlotto (talk) 06:10, 18 August 2016 (UTC)
I think the desysopping was appropriate. I would even strongly propose that community consensus be obtained before reinstating the tools. CodeCat and Wyang have wheel-warred before, and each has blocked the other at least once, among other questioned actions. CodeCat, Wyang: you two are knowledgeable contributors to our content, and you are valuable contributors to our technical infrastructure, but you've both long (and not necessarily in equal measure) shown a tendency towards using your abilities to implement faits accomplis and get your way on e.g. module and entry layout or on treatment of Chinese, respectively. For instance, although on this page CodeCat calls on Wyang "to find a consensus for his proposed changes to [what she asserts is] Wiktionary practice", mere days ago Benwing called her out for again using her bot to create many new entries inconsistent without our existing entries and practices. Wyang, in turn, has threatened a few times to take his ball and go home if we don't agree with an action or, long ago, the unification of Chinese. These attitudes have driven away other editors; for instance, User:Mkdw just recently left after calling out Wyang's use of admin tools in the BP, while User:Ruakh has been largely inactive since earlier disputes with CodeCat over modules (as noted e.g. here) and the presentation of module errors (then and still now I agreed with CodeCat that module errors should generate a visible error message, but the dispute cost us a knowledgeable technical editor). This particular wheel-war seems especially excessive because the dispute seems to be not over whether there should be an automatic translit feature for complex non-European scripts like Thai, but over where it is most elegant to put that feature. - -sche (discuss) 06:49, 18 August 2016 (UTC)
  • As, what it feels like anyway, the only non-admin reading these discussion boards, I express my consensus and agree with -sche that the stripping should not be time-bound but powers should only be restored when the community is convinced that the issue is done with in such a way that neither will have any incentive to do something which sparks it up again. I also repeat my conviction that no party of an edit war (as defined by me above as a conflict where two reversals of an edit have happened) should have the right to block or unblock any participant. Korn [kʰũːɘ̃n] (talk) 10:14, 18 August 2016 (UTC)
  • I pretty much agree with you on everything (not unusual, by the way). We have here two equally stubborn and overbearing people who have met their match- if it weren't for the stakes and the damage done, it might be satisfying to see both get their comeuppance. As for duration, I was careful to say "I intend", because the week was just an arbitrary time picked out of the air, and I was hoping we could come up with something better. Right now both are responding with stereotyped "talking points" about the failings of the other, which shows both are still dealing with this on a strictly emotional level. The truth is, both are basically right about each other in the most part, but it's irrelevant. We need to come up with a solution that makes sense and that both can live with. Chuck Entz (talk) 14:16, 18 August 2016 (UTC)
    I think the desysopping has to continue until the matter is resolved. As long as there is no sulking, the project will continue to benefit from their contributions. I hope that the project does not suffer from lingering bad feelings once these valued contributors regain their sysop status. DCDuring TALK 13:10, 19 August 2016 (UTC)
  • I support the emergency temporary desysopping of both editors made by Chuck Entz on account of interminable wheel-warring. I believe a bureaucrat is authorized to take such temporary measures to eliminate this kind of wheel-warring, without a vote. --Dan Polansky (talk) 11:44, 21 August 2016 (UTC)

To be honest, I am not expecting any functional input from User:CodeCat regarding the topic at hand, based on her bullying behaviour and unwillingness to engage in discussions in the past few days. Her only argument has been that her edit was based on "consensus", which is obviously nowhere to be found, even when requested again and again.

Treating romanisation as equivalent to transliteration is clearly erroneous (since romanisation = transcription + transliteration), but she keeps reinstating this misinterpretation, with total disregard for the infrastructure of languages which make the distinction between transcription and transliteration on a romanisation level. For example, Module:th-translit does not even describe what it does after her edits, and she is apparently nonchalant about these languages ("It's a misnomer, but that's the way it is.").

This lack of regard for correctness, coupled with her previous heedless deletion of the indispensable code in Module:links (which precipitated all this), are acts of admin sloppiness. Her one-line response of "So, what happens now? Can we please get rid of the Thai code from Module:links now, or do we need some more edit warring?" to my detailed rationales for putting transcription support in the central modules, is exemplification of her uttermost apathy towards the actual topic ("would rather fight not explain") and disrespect to people.

This second episode was perfectly bound to happen, and bound to end tragically, when all that one side of the dispute cares about is "getting rid of the Thai code from Module:links now", even if she has to use "some more edit warring" for that. Yet, there are people cheering for her. Wyang (talk) 10:38, 18 August 2016 (UTC)

I support this action and wish that someone had done something sooner. I called for help above, but nobody responded, so I was very unsure what to do as I didn't feel like I had any options left, and it was all up to me. I'm sad to see that the community only cares when there is edit warring going on but is unwilling to help in solving the problem outside of that. At least now, people's attention is finally here so I can't complain too much.

As far as the dispute goes, I can summarise what I see:

  • Wyang, in principle, believes that transliteration modules should only be used for transliteration in the strict sense: letter-by-letter conversion.
  • Consequently, the Thai transliteration module does literal transliteration, which makes it pretty much useless for Thai.
  • This goes counter to how the term "transliteration" is generally used on Wiktionary; we use the term to refer to transcription, transliteration and romanization in general. Transliteration modules perform all of these functions, and the tr= parameter that is present on many templates is frequently provided with something that is not strictly transliteration, but rather adheres to the Wiktionary usage of the word. Our policies with respect to the use of these parameters and modules are labelled "transliteration" as well, as evidenced by WT:RU TR, WT:EL TR and WT:JA TR for example. None of these transliteration policies describes transliteration in the strict sense (av rather than ay for Greek, ō rather than ou for Japanese, etc.).
  • Because the transliteration module for Thai is useless by Wyang's own choice, Wyang decided that the best way around this was to insert special-purpose code into Module:links, a widely-used general-purpose module, to transliterate Thai correctly by using code present in another module, Module:th.
  • This was disputed by me, arguing that such special language-specific code does not belong in a general purpose module, especially not when it can easily be put into the existing transliteration module and have everything work just fine.
  • User:Wikitiki89, in the last war, did just this: he moved the code over to the transliteration module, where it belongs. This was immediately reverted by Wyang however, and his special code in Module:links reinstated despite it already having been disputed. My efforts to reapply Wikitiki's edits were repeatedly reverted by Wyang.
  • Fast forward to now, when I once again noticed Wyang's special purpose code in Module:links, and got frustrated that the issue was never solved. I therefore once again moved the code to the Thai modules. This again resulted in a revert war.
  • I attempted to explain on Wyang's talk page that in order for his alternative interpretation of transliteration, which involved creating separate modules and infrastructure for transliteration versus transcription/romanization modules, to be accepted, he would have to find a consensus with the community for it and seal it with a vote.
  • Wyang showed no intention of doing this, instead arguing on the merits of his views as if to convince me that separating the two was the right way to go. In my view, this missed the point as it wasn't me he was supposed to convince, but the community at large. Thus, I ignored his arguments and instead tried to focus on stopping him from edit warrning and trying to get community consensus first.
  • Wyang refused to create a vote, instead telling me to create a vote for him. Two other editors also called for a vote, and even offered to make one if Wyang didn't. I welcomed this, but nothing has been done in this regard yet, and Wyang continued his edit war, rather than waiting on the outcome of the vote.
  • I called for help on the Beer Parlour regarding the matter, hoping that other users would be better capable of solving the issue and, especially, to stop Wyang from reverting me each time and get him to wait for consensus. This call for help was entirely ignored, and thus the warring continued.

CodeCat 14:21, 18 August 2016 (UTC)

It seems the issue is a bit more complicated than that. Wyang seems to want to have both transliterations and transcriptions for Thai, used in different places. This is something that goes against the status quo and should need a vote before being implemented. Wyang has refused to draft this vote claiming that the consensus among Thai editors is enough. However, this impacts not only Thai editors, but our readers as well who may be confused by having two different romanization systems in different places. As long as Wyang continues to refuse to draft a vote, I don't think we should allow his system to be put in place. My personal opinion is that there should be one default romanization system, whether it be strictly a transliteration, or a transcription, and if it is necessary to use a different system in etymologies, this should either be done manually with tr= parameter, or potentially with a dedicated Thai template that would allow choosing a different automatic romanization. In either case, all the automatic Thai romanization code, both transliteration and transcription, should be located in Module:th-translit. --WikiTiki89 14:35, 18 August 2016 (UTC)
I'd like Thai to follow the pattern we've already established for Burmese: one automatically generated transliteration system used everywhere outside of Thai entries (translation sections, etymology sections, etc.), and Thai entries with additional transliteration systems (both spelling-based and sound-based). Ideally the automatically generated one should be ISO 11940-2 or at least based on it. —Aɴɢʀ (talk) 15:05, 18 August 2016 (UTC)
@Angr Burmese entries are nowhere near the level of current Thai entries. The current Burmese transliteration is much closer to the spelling, which doesn't help users much with the pronunciation. Ideally, we should have a system created for Thai - with phonetic respellings but for that we need more native knowledge or reliable data available. With Thai, we're are luckier - we have native speakers, phonetic respellings from some dictionaries and "Paiboon" or other transcriptions from published dictionaries sometimes can help reverse-engineer the phonetic respelling (for non-natives). I'd like to see the same methods used for Burmese and Tibetan. --Anatoli T. (обсудить/вклад) 02:32, 21 August 2016 (UTC)
Transliteration is not concerned with representing the sounds of the original, only the characters, ideally accurately and unambiguously. (Wikipedia)
Romanization, in linguistics, is the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, for representing the spoken word, and combinations of both. (Wikipedia)
Transliteration is not the same as romanisation. Romanisation = transliteration (script conversion by spelling) + transcription (script conversion by pronunciation). It is quite embarrassing that we as a dictionary are getting this basic concept wrong in some places, and seem proud of propagating and not rectifying the error. The contrast between transliteration and transcription is a fundamental concept in basic linguistics (and I do not even have a linguistics background). The distinction is strictly adhered to when one discusses romanisation schemes for languages which have a noticeable script-pronunciation discordance, i.e. languages which distinguish between transliteration and transcription on a romanisation level. If one wishes to talk about conversion to the Latin script using any mapping, it is romanisation. There are some places on Wiktionary which have confused these two concepts - for example WT:RU TR - which is precisely why people have complained that "this [i.e. the "translit" system] looks like a mess of transliteration and transcription". If we search for "transliteration of Russian" (or other languages) on Wikipedia, we will quite sensibly be redirected to "romanisation of Russian".
So far the confusion has been largely non-disastrous, since (1) most content has been for Latin-script languages, and (2) for languages which do not use the Latin script and have had romanisation systems devised for them on Wiktionary, the difference in the romanisation outcomes generated by transliteration and transcription is comparatively minimal, for example Russian. Then we deal with languages of the East, which are renowned in keeping the spelling forms used hundreds or thousands of years ago, and therefore have a high script-pronunciation discordance. If we go back to the comparison table of various languages in transliteration and transcription outcomes that I created in June, we can see that transliteration and transcription are two visibly dissimilar modes of romanisation, and this confusion of transliteration and transcription is destined to have disastrous consequences.
People may not be aware of this, but this distinction has been faithfully adhered to when we designed the module infrastructure for Oriental languages, until this incident. From the aforementioned table, we can see that transliteration as a concept is inherently impossible for Chinese and Japanese since there is no script-to-script mapping, and appropriately there is no Module:zh-translit or Module:ja-translit on Wiktionary. What we have in place for Chinese and Japanese is Module:zh/Module:zh-pron and Module:ja/Module:ja-pron which help interpret auxiliary or native phonetic representations of these languages to generate romanisations in a transcriptive manner. On the other hand, transliteration is possible and contrastive with transcription for Tibetan and Burmese, therefore we have Module:bo-translit and Module:my-translit to generate transliterations and Module:bo/Module:bo-pron to generate transcriptions. For Thai, the distinction has also been faithfully observed until the incident: we have Module:th-translit which deals with transliteration, and Module:th/Module:th-pron dealing with transcription of Thai. It has been customary practice to devise module infrastructure for languages observing the transcription-transliteration distinction prudently, and this includes naming modules the way they should be named, to avoid future complications.
Why do we have to be prudent in devising the module infrastructure for these languages, and what are the complications of imprudent and misnomeric handling of languages observing this transcription-transliteration distinction? I explained this in detail in the previous discussions (discussion 1, discussion 2). In short, the two divisions of romanisation (transliteration and transcription) symbolise the two polarising ends of the romanisation spectrum in a dictionary-building context:
etymology (transliteration) ———— pronunciation (transcription).
The reasons we use romanisations on Wiktionary are different in various parts of the project. In translation sections, the purpose of romanisations is to inform readers how the word in another language is pronounced. In etymological comparisons, the purpose of romanisations is to inform readers how the term is spelled, i.e. how it used to be pronounced. Among languages which observe the transliteration-transcription distinction, there is variation in how acceptable it is to approximate one romanisation with the other and use it in all places. This really needs to be decided on a language-by-language basis. For some languages (e.g. Tibetan, Burmese), it is not advisable to use one mode of romanisation in all places. Again using the Tibetan example of བརྒྱད (transliteration: brgyad; transcription: gyaew): It makes no sense to say:
བརྒྱད (gyaew) is a cognate of Old Chinese (OC *preːd)
when the word is actually spelt as brgyad; and it similarly makes no sense to put:
བརྒྱད (brgyad)
as the Tibetan translation of eight. The next day after the faithful Wiktionary user downloads our app, he/she is found at a Lhasa stall, trying to bargain by whipping out their phone and awkwardly pronouncing /brgjad/.
The transcription-transliteration distinction is a pan-linguistic phenomenon not just limited to Thai lacking support in the core module system, and more consideration and acknowledgement that many script-pronunciation discordant languages use two methods of romanisation needs to go into the infrastructure. A system is never perfect (e.g. Module:links and Module:translations already contain language-specific adaptations), but changes are gradual and have to be initiated at the correct end. A step in the wrong direction may precipitate amplified counterproductivity before eventual rectification takes place, like the misinterpretation of transliteration here. Wyang (talk) 00:41, 19 August 2016 (UTC)
@Angr: ISO 11940-2 is a transcription system. Wyang (talk) 00:43, 19 August 2016 (UTC)
@Wyang: You're starting to sound like a broken record. We all understand that in linguistics there is a distinction between transliteration and transcription, both of which can be called romanization (when this is done into the Latin alphabet). But the issue here is not of terminology. Yes, we use transliteration incorrectly according to the linguistic definition (although the common non-linguistic definition would include phonetic transcriptions as transliterations), but if we "corrected" ourselves and replaced the word transliteration with romanization everywhere that we misuse it in our templates, modules, "about" pages, etc., you still would not be happy. Why? Because your problem is that you want to use two different automatic romanization systems (one a "transcription" and the other a "transliteration"), when our templates only support using one automatic romanization system. So let's talk about that issue and not the terminology. --WikiTiki89 01:09, 19 August 2016 (UTC)
Well, I have to sound like a broken record because much of this has already been said two months ago in discussions poorly tended to by User:CodeCat (aside from the one-liner). The core issue is the confusion of transcription and transliteration by people who designed Module:links and the consequent awkwardness in its support for transcriptions. If transliteration = romanisation = transliteration + transcription in the central infrastructure, then where does transcription fit? It merely becomes a transliteration2, which it is not and should instead be contrasted with. The technical side is easy to fix - the shorthand "tr" is perfect already. We only need to store the transliteration and transcription modules as separate in language_data, and turn on the transcription modules at the appropriate point. For example, my revision at Module:links. Obviously there are more rigorous ways, but the approach has to be central to start with, not by confusing this concept even further in languages which truly distinguish them. Wyang (talk) 01:30, 19 August 2016 (UTC)
Ok, so let's say that we do this. Now, which of the two modules is called when our modules need a romanization to be auto-generated? And what use is the module that does not get called?
Also, aside from all this, why have you never thrown up a BP discussion or vote to discuss this proposal? Why did you edit war to put it in place instead? —CodeCat 01:37, 19 August 2016 (UTC)
Personally I think it would be hella confusing to display brgyad in one place and gyaew in another place when referring to the same word. If we are to make a systematic distinction between transliteration (in the proper sense) and transcription, we should include both forms consistently. Perhaps we write བརྒྱད (brgyad ・gyaew) where the dot in the middle links to a page explaining what the two romanizations mean. Mind you, I'm not convinced it's worth the trouble, but if we are to do it, something like this would be the way. Benwing2 (talk) 01:50, 19 August 2016 (UTC)
And in fact that suggestion is already possible without Wyang's changes to Module:links. --WikiTiki89 01:53, 19 August 2016 (UTC)
"བརྒྱད (trlit. brgyad; trscr. gyaew)"? (a bit more explicit; the blue dot in headwords is not terribly intuitive IMO) —suzukaze (tc) 02:08, 19 August 2016 (UTC)
This is fine with me, and I agree is more intuitive. Benwing2 (talk) 06:01, 19 August 2016 (UTC)
CodeCat, all of this was in the original discussions (discussion 1, discussion 2). "Why did you edit war to put it in place instead" - this is irresponsible and unnecessary accusation. Please have a look at the page history of Module:links; the first revert was your heedless revert which paralysed the Thai entries.
བརྒྱད (brgyad ・gyaew) in translations is too confusing for newcomers. The technical support for transcriptions is not difficult to put in place. A simple parallel function of Language:transcribe can be added in Module:languages. This function can be called by Module:links (i.e. to turn on transcription support) unconditionally for language A, or conditionally for language B (e.g. only when Module:links is called by Module:translations, or unless Module:links is called by Module:etymology). Wyang (talk) 04:26, 19 August 2016 (UTC)
What about suzukaze's suggestion? Do you still think it's too confusing for newcomers? IMO displaying different romanizations in different places is far more confusing than displaying both and I would be strongly against that. Benwing2 (talk) 06:01, 19 August 2016 (UTC)
What about བརྒྱད (gyaew [brgyad])? We already do this for Akkadian, for example: 𒆍𒀭𒊏𒆠 ‎(bābili [KA2.DINGIR.RAKI]). (Although I don't understand why gyaew is even needed, none of the IPA transcriptions at the page look anything like it; they all look much more like brgyad.) --WikiTiki89 09:20, 19 August 2016 (UTC)
gyaew is the Lhasa pronunciation: gy /c/, ae /ɛ/, w /˩˧˨/. Frankly, I would be quite confused by the Akkadian word if I saw it in translations (I still am after reading the entry, especially the etymology). It may be less unsatisfactory for Akkadian, as people may be less interested in the spoken aspects of a dead language. I don't think putting transliteration in translations is a good idea for any of the non-small living languages with a high level of script-language discordance. {{bo-pron}} has more examples of transliteration-transcription correspondences in Tibetan. Wyang (talk) 09:40, 19 August 2016 (UTC)
The fact that you are confused by our Akkadian romanizations is not really a problem. We shouldn't necessarily expect people to automatically understand these things. We need to have appendix pages explaining our romanization scheme for each language, just like any other dictionary would do. Such an appendix would explain to you that bābili is the transcription and KA2.DINGIR.RAKI are the names of each character in the word, named by their usual phonetic value, with capital letters indicating Sumerian logograms (Sumerograms; kind of like Kanji) and superscript indicating determinatives. --WikiTiki89 12:26, 19 August 2016 (UTC)
I would love to read some statistics regarding the traffic of our help pages - I have always been under the impression that very few people are able to navigate to our Wiktionary:About... pages, since we do not have an obvious or subtle link on the entry itself linking to the language help page. We do not have a "translate!" tool alongside the search box that helps a reader check if translations of word A in language X exist (i.e. a simple interface with two fields "word" and "language" (dropdown by #speakers), which parses through the content of the entry A to see if it has the translation of any sense in language X), and prompt the user to suggest that we add this translation if there is none. We also do not have a fuzzy search function, or a reverse transcription search, and many other things. Personally, the reason I look up translations is because I want to know how to say the equivalent in another language. Like the common phrase "How do you say ... in the ... language?", not "How do you spell?". I imagine most readers are expecting to find out the pronunciation of a foreign non-Latin-script word on the translation page itself, which is why I'm suggesting simple, straight-to-the-point phonetic transcriptions inside translation boxes. Wyang (talk) 13:07, 19 August 2016 (UTC)
I'm all for making the "about" pages more easily accessible. As for pronunciation, you're supposed to click on the entry and not simply look at the table. The entry should have all the pronunciation information. Someone unfamiliar with Tibetan will not know how to pronounce gyaew anyway. Someone who knows a little bit about Tibetan would realize that the word might not be pronounced brgyad in Lhasa and click on the entry for further pronunciation information. I don't know why you're bringing up search features, they do not seem relevant to this discussion. --WikiTiki89 13:29, 19 August 2016 (UTC)
Yes, people are supposed to click on them, but people (especially casual visitors) often don't. People may not know how to pronounce gyaew initially, but if the display in translations consistently uses transcription and people are pointed to the correct help page, they are more likely to become regular users and use the translation functionality more frequently. The point about the search features was to lament that our user friendliness is (excuse me) crap... and yet, we are here arguing whether or not we should give support to transcriptions which prominently contrast with transliterations in many languages, and whether or not it is worthy to improve user experience with more consideration. Wyang (talk) 14:24, 19 August 2016 (UTC)
Giving a user a piece of unexplained information without a link or even a name for that information, thus effectively blocking the user from figuring out what that information is, is a problem. Because that means you have not given the user any information at all, you just blurted some nonsensical text. Korn [kʰũːɘ̃n] (talk) 15:11, 19 August 2016 (UTC)
Perhaps all of our transliterations should automatically link to a description page, like this: обезья́на (obezʹjána (key), “monkey”)? --WikiTiki89 15:53, 19 August 2016 (UTC)
That's actually how I handle it for Middle Low German grammar. Though I'm not sure it needs to happen for plain transliteration, which should be more intuitive than Sumerograms. In case of doubt, better safe than sorry, though. Korn [kʰũːɘ̃n] (talk) 16:08, 19 August 2016 (UTC)
The only downside to that idea is that it puts too much emphasis on the transliteration, rather than on the word itself. Another idea I've always contemplated was to just get rid of all transliterations in links and have them only in entries and etymologies, but that's a very radical change. Another idea I just had is what if we have links to transliteration keys after the language name in translation tables and at the top of each language header. --WikiTiki89 16:43, 19 August 2016 (UTC)
What if we just made the transliterations themselves the links to the keys? (Languages where the transliterations have entries (e.g. Gothic) could continue to link to those entries, since they contain, or link to the main entries which contain, much the same information as the key would.) - -sche (discuss) 17:15, 19 August 2016 (UTC)
I did consider that. My first thought is that it would look weird for all transliterations to be colored as links. Also, would the reader know what he would get from clicking the transliteration? But maybe it's not such a bad idea. We should limit this to link templates, though. Usage examples and other such things probably don't need to have their transliterations linked. --WikiTiki89 17:54, 19 August 2016 (UTC)
Strong oppose for any move to remove transliterations / romanizations from links. That would greatly reduce the usability of all Japanese entries. ‑‑ Eiríkr Útlendi │Tala við mig 00:42, 20 August 2016 (UTC)
A lot of online dictionaries have significantly better interfaces than us. Some use hover over for all links to show a sneak peek of the linked-to entry; examples are Moedict, CantoDict, Thai-language. These are all impressive tricks which we can potentially implement to greatly improve the user experience. The link in translations can be turned into a hover-over link which previews the pronunciation and first sense of the term, and on mobiles it can be simple link with transcription following it in parentheses. The point is we need to suitably name and record our utility modules, so that we can easily call on them and not come to the realisation we have mixed up all the transliteration and transcription modules when there is a need to use transcriptions. Wyang (talk) 00:46, 20 August 2016 (UTC)
But at what point has there ever been a need to choose between them or display them both? If we have both a transcription and a transliteration module, would they ever both be used for anything? —CodeCat 01:10, 20 August 2016 (UTC)
In translations. The purpose of having romanisations in translation sections is to inform readers how to say something in another language. Transcription modules, if they exist, should be preferentially called upon when romanising terms in translation sections. Wyang (talk) 02:08, 20 August 2016 (UTC)
I think presenting both romanisations simultaneously in translations is confusing - readers are unlikely to understand what the difference between transliteration or transcription is, or the difference between Wylie transliteration and Tibetan Pinyin. I would prefer presenting the information in the entry itself, and presenting only what is necessary in translations, e.g. བརྒྱད (pr. gyaew). Wyang (talk) 09:13, 19 August 2016 (UTC)
You can always make the words give a one line explanation of the difference on hover over. Korn [kʰũːɘ̃n] (talk) 09:37, 19 August 2016 (UTC)
We have to be careful with using hover over though - it does not seem to be well-supported on mobile devices. Wyang (talk) 10:01, 19 August 2016 (UTC)
  • I'd be fine with making the de-syspopping of CodeCat permanent. This is the latest in a series of abuses of the tools, ranging from bad blocks to making major changes without community consensus. Purplebackpack89 18:59, 18 August 2016 (UTC)
IMO, any action like this needs to be by formal vote. (Note that there was already a vote to desysop CodeCat, which failed.) Benwing2 (talk) 20:22, 18 August 2016 (UTC)
There should be no double standard. Either both Wyang and CodeCat have their sysop powers restored upon resolution of this problem, or they both have to reapply and be voted on. I do not understand, however, why CodeCat's edits are no longer autopatrolled. That should be fixed as soon as possible. —Aɴɢʀ (talk) 21:55, 18 August 2016 (UTC)
I overlooked that detail. Fixed. Chuck Entz (talk) 02:11, 19 August 2016 (UTC)
I can always trust you to make everything be about you and your grievances, no matter the subject. That type of attitude is a large part of what caused this mess in the first place- we need less of it, not more. Chuck Entz (talk) 02:11, 19 August 2016 (UTC)
I oppose CodeCat's recent desysop. First off, where is the formal vote? Second of all, I've not really had any problems with her. I think her intentions really are good, but she may have made a mistake, just like all of us have. Jeez if I had a penny for every mistake I've made on the internet, I'd have like 10 bucks (which is a lot of pennies!). I feel like it's only if a person continues to make such mistakes somewhat consistently over a long period of time, or do something really bad (like delete the main page, for instance), that they should be desysopped because of behavior. I'd be willing to put up a vote to get her resysopped (hey look I made up a new word!) if necessary. Philmonte101 (talk) 22:33, 18 August 2016 (UTC)
A vote would be required for a permanent desysop, but in this case, the desysopping was temporary in order to stop an ongoing edit war. Normal users would have received a temporary block for this, but admins can unblock themselves, making such a block useless if the admin is determined to circumvent it (and both of them did so in this case, before they were desysopped). Thus, I think the temporary desypping was justified. --WikiTiki89 23:36, 18 August 2016 (UTC)
It seems like you completely misunderstand the entire situation. The desysop was the emergency countermeasure to a serious edit war, not "a mistake". —suzukaze (tc) 01:09, 19 August 2016 (UTC)

This topic must not die again. How are we going to set up the transcription/transliteration infrastructure? —suzukaze (tc) 00:39, 21 August 2016 (UTC)

Agreed. I think most people are in agreement that the status quo of one single transliteration is OK, and it's also OK to display two transcriptions/transliterations for languages like Tibetan and Burmese where the pronunciation and written script are far from each other and where the written form carries important etymological information that's missing from the modern pronunciation. This potentially could be done for Thai and Khmer as well although here I think it's less useful, as the difference between the two isn't so much, and the extra information in the written form is mostly only present in Sanskrit loanwords, which are fairly unproblematic etymologically. The main issue here is that Wyang disagrees and wants to impose a system where we show transcriptions in some places and transliterations in others, but I think pretty much everyone else is opposed to this so it won't fly. We could vote on this but Wyang has to be willing to accept the result, since he seems to be the main one who would implement it. Benwing2 (talk) 01:07, 21 August 2016 (UTC)
My main points were: 1) transliteration and transcription should not be confused; 2) for languages which can both be romanised with transliterations and transcriptions, the functional modules should be distinguished and named appropriately; 3) using multiple romanisations is very confusing in translations and readers will not understand the difference; and 4) translation sections should use transcriptions to romanise terms, if transcriptions are contrastive with transliterations. I do not believe I am the only one who is in favour of this. Discussions should involve effective argumentation, not by merely accusing others of being outlandish. Wyang (talk) 01:36, 21 August 2016 (UTC)
Eh, I find it reasonable to display only the relevant romanization to reduce clutter. The entry itself could show which is a transcription and which is a transliteration. —suzukaze (tc) 01:55, 21 August 2016 (UTC)
I prefer to see transcriptions as is currently done by the Thai module. Transliterations or symbol sequence can still be found in Thai entries. --Anatoli T. (обсудить/вклад) 02:32, 21 August 2016 (UTC)
I suggest recording the transcription modules in language_data, creating a parallel Language:transcribe function in Module:languages, and making Module:links call on this function unconditionally or conditionally for certain languages. Wyang (talk) 01:26, 21 August 2016 (UTC)
Thanks for summarizing your position. Benwing2 (talk) 01:41, 21 August 2016 (UTC)
Note that you haven't answered whether you will accept the community's consensus if it goes against yours. Benwing2 (talk) 01:41, 21 August 2016 (UTC)
Fine. Bye bye. Wyang (talk) 01:42, 21 August 2016 (UTC)
Christ. I was trying to play mediator but seem not to have been successful. Wyang, I do hope you will reconsider. No one wants to see you leave. Benwing2 (talk) 03:53, 21 August 2016 (UTC)
Repeatedly using imagined “consensus” (your opinion) as majority tyranny to intimidate others is hardly mediation. I am perplexed how the above discussion could be interpreted as me spewing out nonsense and needing to be brought under control. I elaborated my various points in the discussion and there isn't really any opposing argument regarding either using transcriptions in translations or separating transliteration and transcription utilities for certain languages. Then there was your “summary” which identified the need to smother me without providing any counterarguments whatsoever. To my technical proposal, instead of commenting on the feasibility or reasonableness of this, you again tried to smother me by labelling whatever you believe in as “consensus” and coercing me to accept it. This is opposing for the sake of opposing, without bringing in any intelligent arguments to the discussion. This is bullying. How is བཀྲ་ཤིས་བདེ་ལེགས (zhacf-xih-dev-leh [bkra shis bde legs]) not confusing as the Tibetan translation of hello? It is frustrating to try to have people think sensibly and analytically about topics with the future in mind on Wiktionary. Look at how long it took for the community to come to senses with the Chinese merger and now this; time and time again, it is regression led by the unfamiliar majority, without critically analysing proposals for what they are. Wyang (talk) 02:16, 22 August 2016 (UTC)
Wyang, I am sorry things have gotten to the point that you think I am smothering you, bullying you, tyrannizing you, etc. It was not my intention to do any of these things, and I apologize for giving the wrong impression. How about we simply hold a vote on what is the best way to handle this? This is the Wiktionary way of doing things, and will more clearly reveal the consensus. Are you willing to lead that vote? Benwing2 (talk) 03:12, 22 August 2016 (UTC)
Thank you. Wikipedia:Polling is not a substitute for discussion; Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_democracy has more arguments why decision making should be achieved by discussion and consensus, not votes. It is not sensible to expect voting to be the most suitable method of decision making, when the great majority of eligible voters are uninvolved and perhaps have no prior familiarity with how the transliteration-transcription distinction manifests itself in the romanisation of certain languages. It is akin to believing that User:Wyang will be a responsible voter on the topic of Akkadian romanisation; quite the contrary I have no previous experience with this and any stance I take regarding Akkadian romanisation could be very unwise for the project's future. In this discussion we should be appraising whether the preferential use of transcriptions in translations is favourable over transliterations for certain oriental languages (Tibetan, Burmese, etc.; example below), and consequently whether transliteration and transcription modules should be kept separate for these languages. Wyang (talk) 04:27, 22 August 2016 (UTC)
“eight” “ear” “hello”
Transcription བརྒྱད (pr. gyaew) རྣ་བ (pr. naf-waf) བཀྲ་ཤིས་བདེ་ལེགས (pr. zhacf-xih-dev-leh)
(without tone letters)
བརྒྱད (pr. gyae) རྣ་བ (pr. na-wa) བཀྲ་ཤིས་བདེ་ལེགས (pr. zhac-xi-de-le)
Transliteration བརྒྱད (brgyad) རྣ་བ (rna ba) བཀྲ་ཤིས་བདེ་ལེགས (bkra shis bde legs)
A vote is good way way out. The above speaker seems to underestimate the ability of voters to inspect and evaluate evidence and to consider arguments presented. Instead, he seems to commit what looks like an authority fallacy, the erroneous notion that only those already familiar with Thai can make a sound judgment about Thai romanization, be it transcription or "transliteration" narrowly construed.
How consensus, mentioned in the above post, could ever be anything different from the result of a vote is beyond me. Since, consensus is a general agreement even if not unanimity, and I fail to see how a passing vote could ever show anything other than consensus. --Dan Polansky (talk) 08:02, 22 August 2016 (UTC)
A vote, where the topic is unfamiliar to most of the eligible voters, can easily produce ill-advised consensus (“collective stupidity”). And yes, only those familiar with how the transliteration-transcription distinction manifests itself in the romanisation of certain languages can make a sound judgment about the issue. Calling for a vote is not the way out if that side shows no willingness to engage in discussions and present counterarguments to reach consensus. Cases of collective intelligence are when those familiar and knowledgable about the topic critically appraise the arguments for and against the proposal to attempt to reach consensus, not when the proposal is relayed to a vote to see which side is more numerous. Wyang (talk) 10:22, 22 August 2016 (UTC)
You have the option to present "how the transliteration-transcription distinction manifests itself in the romanisation of certain languages". In fact, you have just done that in a table above. I trust most of the voters to consider such presentation in their voting decision. Discussion alone is not a mechanism of decision making; indeed, in this dispute, both parties think they are right and that they have presented the right arguments. Strength of argument is not a mechanism of decision making since there is no simple mechanism to assess strength of argument. --Dan Polansky (talk) 11:13, 22 August 2016 (UTC)
Having information presented can never, ever supplant being knowledgeable about a topic. For instance, if I were provided with a comprehensive overview of the Akkadian language, I would still feel that I am in no position to provide any judgment on Akkadian romanisation. In fact, Wiktionary has witnessed many lessons learnt from having the unfamiliar collectively voice opinions and make decisions. The merger of Chinese is one - it was only adopted in 2014, more than 10 years after the launching of Wiktionary. So much more work could have been done in the meantime, and so much work still remains to be done to rectify the initial step in the wrong direction. The misinterpretation of transliteration is another one. All of these resulted from the lack of intelligent decision making from people who are familiar and knowledgeable about the topics. Discussion is, and is arguably the most important mechanism of decision making, and I would argue that no decision making should be achieved without any substantial discussion. In this dispute, there has been a paucity of argumentation from one side throughout, and a paucity of active discussion of the topics at hand (whether the preferential use of transcriptions in translations is favourable over transliterations for certain oriental languages, and consequently whether transliteration and transcription modules should be kept separate for these languages). Wyang (talk) 12:05, 22 August 2016 (UTC)
If you submit that after being presented the table, I still don't appreciate the difference between letter or character based transliteration and pronunciation based transcription, you are drastically underestimating my capacity to understand very simple things. Nor do I think other readers have failed to appreciate the distinction. Your fallacy is grave. --Dan Polansky (talk) 12:58, 22 August 2016 (UTC)
Basically, you're fighting against any way for other editors to disagree with you. Is there any way Wiktionary editors could decide on a course that you disagree with that you'd accept?--Prosfilaes (talk) 13:43, 22 August 2016 (UTC)
Transliteration / transcription is about providing a Latin-script handle for people who don't read the script. Anywhere but the entry for the word itself we should be using one consistent Latin-script version, and there were can and should provide every transcription/transliteration version now or once in standard use.--Prosfilaes (talk) 09:57, 22 August 2016 (UTC)
Romanisation is indeed about providing a Latin-script handle for people who don't read the script. Nonetheless, the reasons we want to use romanisations are different in various parts of the project. It could be to show how the foreign-script word is pronounced (for example in translation sections), or how it is spelt literally (in etymological comparisons). At the moment, Tibetan is romanised with a transliteration method (Wylie transliteration), which is 100% automatable and is fantastic in etymological comparisons, as it faithfully represents how the word is spelt. However, there is no point showing brgyad as the Tibetan translation of eight, as readers will automatically assume the romanisation in translations is the word's pronunciation and attempt to pronounce it as such when communicating with locals. It makes more sense to simply use transcriptions to inform readers of the pronunciation in translation sections. Wyang (talk) 10:22, 22 August 2016 (UTC)
Romanization is not for showing how words are pronounced. That's what the pronunciation section in the entry is for. If you're using Wiktionary's translation tables as a pronunciation key for communicating in that language, epic fail. In any case, showing བརྒྱད shows six rather different pronunciations; giving me gyaew instead of brgyad doesn't really help me pronounce the word.--Prosfilaes (talk) 13:43, 22 August 2016 (UTC)
Are you sure?! You should then talk to Japanese editors and tell them they should transliterate こんにちは as "konnnichiha" They've been doing it wrong all these years! Also, get in touch with some other dictionary publishers and tell them their Korean and Thai transliterations are wrong. --Anatoli T. (обсудить/вклад) 13:49, 22 August 2016 (UTC)
Sarcasm aside, the rest of my statement still stands. If you want to know how a word is pronounced, look at the pronunciation key, not the translation table. Readers who "automatically assume the romanisation in translations is the word's pronunciation" is going to be consistently lost, and I fail to see how gyaew is going to help any reader who doesn't know Tibetan figure out the pronunciation is /cɛʔ¹³²/ or /bɡjat/ or /dʑɛʔ⁵³/ or /dʑed/ or /wɟjal/ or /hdʑal/. I in fact feel that any reader who could use gyaew to derive the correct pronunciation probably knows enough to figure out that brgyad isn't the pronunciation transcription they were looking for.
Romanize as you will, but the value of having one romanization throughout Wiktionary and giving readers a consistent Latin-script name for a word outweighs the value of having different romanizations in different places.--Prosfilaes (talk) 15:53, 22 August 2016 (UTC)
Now that CodeCat and her supporters have successfully driven away Wyang from the project, someone has to take over all the work he has been doing. Congratulations! I am disgusted with community's reaction to the problem. --Anatoli T. (обсудить/вклад) 02:21, 21 August 2016 (UTC)
Anatoli, what do you think should have been done (or should be done) differently? Benwing2 (talk) 04:17, 21 August 2016 (UTC)
I don't think it's anyone's fault but Wyang's, given that the leaving was in response to "Note that you haven't answered whether you will accept the community's consensus if it goes against yours."--Prosfilaes (talk) 13:07, 21 August 2016 (UTC)
I know the technical subject is not the subject of this thread but anyway: could not the naming disagreement be solved by placing CodeCat code in Module:th-transcr rather than Module:th-translit? Then, the misnomer argument would no longer apply, and other argument against CodeCat's solution would have to be sought. --Dan Polansky (talk) 08:06, 22 August 2016 (UTC)
The code was originally placed in Module:th (function getTranslit). Either Module:th or Module:th-transcript is fine, though either way the transcription module needs to be recorded in addition in Module:links or Module:languages/data2, as translit_module is a misnomeric parameter. Wyang (talk) 10:22, 22 August 2016 (UTC)
A further question: are the modules currently present in Category:Transliteration modules in general transcription modules or are they overwhelmingly transliteration modules in the narrow sense, transcribing on the letter or character level? --Dan Polansky (talk) 08:13, 22 August 2016 (UTC)
Just call it all 'Romanisation modules' and be done with it. Korn [kʰũːɘ̃n] (talk) 10:57, 22 August 2016 (UTC)
I oppose calling those modules "Romanisation modules". There are elements of "transcription" (more or less) in many languages, most of them are standard transliteration. Here are examples of transliterations with elements of transcription, "the translit" shows more graphical transliterations of the same word (the actual spelling):
Arabic: عربى ‎(ʿarabiyy), translit: "ʿrbā", vocalised Arabic: عَرَبِيّ ‎(ʿarabiyy)
Greek: Μπούρμα ‎(Boúrma), translit: "Mpoúrma"
Russian: легкого ‎(ljóxkovo) (phonetic respelling: "лёхково"), translit: "legkogo", spelling with "ё": лёгкого ‎(ljóxkovo)
Japanese: こんにちは ‎(konnichiwa), translit: "konnichiha"
Korean: 십육 ‎(simnyuk) (phonetic respelling: "심뉵"), translit: "sibyuk"
Hindi: फिल्म ‎(film), translit: "philma", spelling with "nuqta": फ़िल्म ‎(film)
One can argue that abjad languages like Arabic, Persian, Urdu, Hebrew, etc. can't be transliterated but romanisations are still called transliterations. Persian and Urdu are seldom fully vocalised, so their graphical transliterations would be completely useless for someone wanting to know how to pronounce Persian or Urdu words. Some irregularities are handled by transliteration modules, for some terms manual (hard-coded) transliteration is required. If someone accuses Wyang for making up transliterations for Thai, check Paiboon dictionaries for terms like ชาติ ‎(châat) (graphical transliteration: "châa-dtì") and see how these terms are transliterated there. --Anatoli T. (обсудить/вклад) 13:18, 22 August 2016 (UTC)
I see no problem with calling any rendering of a non-Latin word in Latin script a romanisation as a hypernym and only referring to it as a transliteration/transcription specifically when it's important to underline the difference. (Maybe leave a note in the documentation.) If a module is made which does both, as some parties propose, or if only one is reasonable for a language, why not go with an indiscriminating 'romanisation' so you can categorise them all easier and don't have to waste debate time on naming conventions? Korn [kʰũːɘ̃n] (talk) 14:37, 22 August 2016 (UTC)
  • A propos of the voting/consensus matter, I am particularly well qualified to contribute as one of those ignorant of most aspects of the matters under discussion.
The rationales for requiring a consensus of more than those knowledgeable about the languages in question is that it might interfere with the module architecture as currently designed, that the translation tables might become cluttered, and that some users (including those not intending to learn the languages and scripts in question) might be confused/overwhelmed/put-off by the transliteration-transcription distinction.
What makes sense for entries in the languages in question is a matter best left to the contributors in those languages IMO. If our module architecture did not anticipate the need for a transcription-transcription distinction, then so much the worse for the architecture. We cannot have the module architecture unreasonably preventing contributors from contributing in the manner that is best for the languages in question by their lights. IOW, we should not have the tail wagging the dog. How to apply this principle is left as an exercise to the reader.
I can only beg that the translation table matters do not make the tables cluttered and confusing for all to deliver a questionable benefit to some. DCDuring TALK 12:40, 22 August 2016 (UTC)
@DCDuring I agree! But wait, maybe CodeCat is eager to make a new module for Khmer or Burmese language and apply their "best practice" there? Well, Wyang has started, somebody can make those modules perfect!
Seriously, I perfectly understand Wyang's frustration. He created a WORKING SOLUTION for complex Asian languages nobody even attempted before. Now, someone starts changing modules without any discussion with him. I would be very upset if someone tried to change my work without first checking with me. Why people even think they should be both blocked? How would YOU feel if you were in the same situation? I don't want CodeCat blocked but I think she is absolutely wrong here. Yes, location of the code can be reviewed and discussed, agreed first and only THEN changed, if the agreement is reached.--Anatoli T. (обсудить/вклад) 13:29, 22 August 2016 (UTC)
They were both blocked/desysopped because they both used their admin powers to continue an edit war. It's not a punishment for being on the wrong side of the argument, it's a method for suppressing disruptive behaviour. Korn [kʰũːɘ̃n] (talk) 14:43, 22 August 2016 (UTC)
@Dan Polansky Dan, can you create a vote to short-circuit endless arguing? The vote should have two choices: (1) Continue the current situation where Module:links enforces the constraint that a single romanization (which may be a two-part transcription/transliteration romanization, on a language-specific basis) is used for all types of links; (2) Modify Module:links to allow different romanizations for different types of links (e.g. etym links vs. translation links). The former is User:CodeCat's position, the latter is User:Wyang's. Set the discussion period and vote start/end dates however you think most appropriate. Benwing2 (talk) 15:34, 22 August 2016 (UTC)
I agree with what User:DCDuring said above. My proposal is to keep transliteration and transcription utilities modules separate in Module:languages/data2 and similar modules, for languages possessing two contrastive sets of romanisation schemes. Notable examples include Tibetan (Wylie transliteration vs Tibetan Pinyin), Burmese (MLCTS vs BGN-PCGN), Thai (ISO 11940 vs Paiboon) and Korean (Yale vs RR). The rationale is that the module infrastructure should anticipate the need for a transliteration-transcription distinction in certain languages, and not unreasonably prevent contributors of these languages from contributing in a manner that is best for the languages in question by their lights. I am in no position to singlehandedly advocate that language X should use romanisation Y for a certain purpose without meticulous discussion having taken place surrounding language X, which need to happen in separate language-specific discussions. Still, there is a lack of adequate in-depth discussion concerning the issue, especially from arguments against - why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages? Wyang (talk) 23:56, 22 August 2016 (UTC)
If they are kept separate, then there needs to be a functional reason. The distinction needs to have a consequence in how our modules work and treat each one, and where each one appears. I don't think it particularly desirable to have multiple romanization schemes in different parts of Wiktionary, this just confuses users. The system we have now, with a consistent representation across Wiktionary, is just fine. We don't need two systems when one suffices. —CodeCat 00:13, 23 August 2016 (UTC)
  • The whole point of Wyang's argument is that two systems are already in broader use for certain languages, with each system used for specific purposes. I.e., one romanization scheme doesn't suffice, for certain specific languages. ‑‑ Eiríkr Útlendi │Tala við mig 00:40, 23 August 2016 (UTC)
  • The question is “why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages”? It does not make sense to use multiple romanisation schemes for Greek, Russian, Georgian, Armenian, etc., but the languages of question are languages which contrast these modes of romanisation prominently. Does it make sense to keep transliteration and transcription modules separate in the module infrastructure for these languages? Yes. Many editors of these languages have been conscious of the need to use the appropriate romanisation in certain contexts. See for example, how User:Angr changed the romanisation of the Burmese word to a transcription at elephant. Our Korean romanisation scheme is the transcriptive Revised Romanisation scheme, which is official in South Korea (also hidden under the misnomer Module:ko-translit). User:Visviva, our first prolific Korean contributor, created the entry 미끄럽다. Note the differential use of a transcriptive romanisation in the main text (mikkeureopda) and a transliterative romanisation in etymology (muys.kulepta). Considerations of the arguments for and against need to be made in the context of these script-pronunciation-discordant languages. Provided the romanisation is well-annotated, such as 믯그럽다 (Yale: muys.kulepta) at 미끄럽다, the appropriate, purpose-oriented use of romanisations is hugely beneficial to dictionary building, for these languages. Wyang (talk) 00:54, 23 August 2016 (UTC)
I will just add, Wyang, that you keep claiming that the people opposed to you are giving no real reasons for doing so, but you yourself have given no reasons why consistently using a dual romanization scheme, like I've suggested as a compromise between your view and CodeCat's, is unacceptable, other than the unsupported claim that it's confusing for new users. Benwing2 (talk) 01:09, 23 August 2016 (UTC)
  • FWIW, my bias is to using the more-phonetic romanization (I recognize this is not IPA-grade, but it *is* generally closer to how an English speaker would say something) in translation lists and similar locations, and using the strict transliteration (i.e. letter-for-letter) romanization in etymology sections and other discussions of the term's etymology and historical development. I.e., I disagree with Benwing's suggestion, and I don't want to see both systems used in all cases. This would be similar to what is already in practice for Korean. ‑‑ Eiríkr Útlendi │Tala við mig 01:18, 23 August 2016 (UTC)
  • Exactly. The information presented in a dictionary should be succinct; dual romanisation in translations is infoxication. The reason users look up translations is to answer their questions of “how do you say ... in ...?”, and romanisation in translations should cater to the need of users. Let's look at how other translation dictionaries do this: the only previewable English-Tibetan dictionaries on Google Books are 1, 2 and 3 and all are using transcriptions only. Transcriptions answer the users' questions directly, without additional romanisations to complicate their information processing (below). Wyang (talk) 02:57, 23 August 2016 (UTC)
“hello”: བཀྲ་ཤིས་བདེ་ལེགས (pr. zhacf-xih-dev-leh) vs བཀྲ་ཤིས་བདེ་ལེགས (zhacf-xih-dev-leh [bkra shis bde legs])
“birthday”: འཁྲུངས་སྐར (pr. chungf-gaaf) vs འཁྲུངས་སྐར (chungf-gaaf ['khrungs skar])
“brain”: ཀླད་པ (pr. laef-baf) vs ཀླད་པ (laef-baf [klad pa])
Do you have any evidence that users look up translations solely for the pronunciations of things? The main reason that the system with multiple romanisations in confusing is that not a single one of them is explained.
“brain”: ཀླད་པ (pr. laef-baf, sp. [klad] [pa]) Korn [kʰũːɘ̃n] (talk) 10:08, 23 August 2016 (UTC)
There is plenty of evidence for that. Many transliterations are efforts of a few people over a period of time who took part in their development. Apart from Wyang, myself, you can talk to people like Eirikr, Haplology (Japanese), TAKASUGI Shinji (Japanese and Korean), Aryamanarora (Hindi), Benwing2 (Russian), Saltmarsh (Greek) why transliterations are the way they are. They use some phonetic elements, they are not IPA and not supposed to convey the pronunciation accurately. --Anatoli T. (обсудить/вклад) 10:26, 23 August 2016 (UTC)
Anatoli, I don't understand what you are trying to tell me with your comment. Wikipedia says that Tibetan Pinyin does not mark tone. Our pronunciation sections use the label 'Tibetan Pinyin' but according to Wang, the superscript letters we see are tone marks. What kind of system is that? It seems in need of relabeling. Korn [kʰũːɘ̃n] (talk) 10:34, 23 August 2016 (UTC)
Korn: See Tibetan pinyin#References. This is the modified version of the official Tibetan Pinyin, with tone letters. The Wylie transliteration scheme is the gold standard of Tibetan romanisation; it and its variants are used in almost all scholarly publications. But it is interesting that all of the three previewable English-Tibetan dictionaries on Google Books use transcriptions only to romanise their Tibetan translations. Wyang (talk) 10:56, 23 August 2016 (UTC)
My point was that for each language, the interested and knowledgeable editors decided what and how to go about transliterations for specific languages. I could bring series of discussions about Korean. Wyang implemented most of it. The phonetic transliteration (RR) was adopted - officially recommended in South Korea. There were relevant long discussions, decisions made. Now with the argument between Wyang and CodeCat everybody joined with their opinions but cared little when the actual problems were discussed and solved. --Anatoli T. (обсудить/вклад) 11:08, 23 August 2016 (UTC)
Wyang, there should be a note about what the symbols mean on About: Tibetan, including what the tone marks mean, this information is not easily retrievable. Anatoli, I assume the reason for that is that now the community is forced to take note of the situation because it's brought to the Beer Parlour whereas before it was discussed amongst editors of the language. Korn [kʰũːɘ̃n] (talk) 14:55, 23 August 2016 (UTC)
Absolutely; have to work on it later. Wyang (talk) 21:16, 23 August 2016 (UTC)


Hello. I am here because there's nowhere to post. It's about az.wiktionary (Azeri aka Azerbaijani). There is some problem that a user (perhaps admin) that is native changed verb categories from fel to feil last year. Later, another user that is also native rejects his changes. There is not many people there to discuss about this. I hope this community has knowledge about Azeri and can decide which one to name the categories. --Octahedron80 (talk) 06:02, 18 August 2016 (UTC)

PS If you want to make changes, please do at az.wiktionary directly. I am off. --Octahedron80 (talk) 06:19, 18 August 2016 (UTC)

(calling @Aabdullayev851) A third spelling for verb is fe'l. I think there are slight differences in fel / feil / fe'l, but I not sure what the differences are. —Stephen (Talk) 15:12, 22 August 2016 (UTC)
Hello Octahedron80 and Stephen. I am new admin in Azerbaijani Wiktionary. I am thanks you about your attention to Azerbaijani. There is no fe'l in Azerbaijani language. I am sure about that. But fel or feil? Some sources use fel, but some sources use feil. I am also confuse about this words. How we can decide which is correct? --Aabdullayev851 (talk) 17:00, 22 August 2016 (UTC)
OK. Fe'l use in Soviet time in Azerbaijani. After that start using fel. After 2004 using feil. So current version is feil. You can see this changes in şe'r - şer - şeir. --Aabdullayev851 (talk) 17:31, 22 August 2016 (UTC)
Let's use modern spelling. Feil maybe. --Octahedron80 (talk) 00:33, 23 August 2016 (UTC)
Ok Octahedron80 --Aabdullayev851 (talk) 07:25, 24 August 2016 (UTC)

Proposal: 3-revert-rule[edit]

Wikipedia has a 3-revert-rule which exists explicitly to stop endless edit wars. I think Wiktionary should have a similar policy.

However, I don't know if it should be different in every detail from Wikipedia's version. For example, Wikipedia gives both users three reverts, which means that once all three are used up, the edit is kept rather than reverted. On Wiktionary we follow the principle that disputed edits should be reverted first and then discussed for consensus. So it would make more sense if, when both parties have exhausted their reverts, the final state of a page is the original state rather than the state with the disputed edit. I therefore think that the editing party should get only 2 reverts.

Another detail that should be worked out is how to enforce it, especially when two admins are involved. Wikipedia has a special 3RR noticeboard, which makes sense because it happens so often there. But here it's pretty rare, so there's probably not that much need for a special forum. Would WT:VIP do? I feel that the Beer Parlour is insufficient, as I found out myself when I tried to report an incident and it was ignored. Is it possible at all to make a rule against sysop apathy? —CodeCat 22:06, 18 August 2016 (UTC)

I really don't want us to start enshrining common sense as policy or to start having highly regulated drama-fests like Wikipedia does. Editors shouldn't revert-war and wheel-war because it's disruptive; we shouldn't need an explicit policy stating what everyone already knows. As for returning a page to the "original state", who decides what the "original state" is? If today I revert something another editor did back in 2013, and 15 minutes later he reverts my revert, what's the "original state"? I like how unregulated and uncomplicated Wiktionary is compared to Wikipedia, and I want to keep it that way. —Aɴɢʀ (talk) 22:15, 18 August 2016 (UTC)
I don't see a way to enforce any of our policies. Renard Migrant (talk) 22:23, 18 August 2016 (UTC)
The enforcement mechanisms are social pressure and the action of bureaucrats. Sadly, some seem largely immune from social pressure. DCDuring TALK 02:09, 19 August 2016 (UTC)

PIE root[edit]

--Daniel Carrero (talk) 11:17, 21 August 2016 (UTC)

Wiktionary:Table of votes[edit]

I created Wiktionary:Table of votes. It is automatically generated using Lua.

Feel free to discuss the new page. If there are any suggestions, I can make the changes in the Lua code.

My idea was to let people know when they voted and when they didn't vote. I was hoping that maybe this would increase the turnout in all the vote pages. --Daniel Carrero (talk) 13:58, 22 August 2016 (UTC)

I also added an "expand" link, at the bottom of the vote box, pointing to the new table of votes. --Daniel Carrero (talk) 21:06, 22 August 2016 (UTC)

Unfortunately, the page is too wide. I wonder if I should implement a list of abbreviations for all users. Examples:

--Daniel Carrero (talk) 14:47, 23 August 2016 (UTC)

I will object to any foreshortening of User:I'm so meta even this acronym which is not "ISMETA." - TheDaveRoss 18:08, 23 August 2016 (UTC)
Oops! I typed "IMSOMETA" but I fixed it now. I also removed "META". Those were mistakes. --Daniel Carrero (talk) 18:43, 23 August 2016 (UTC)
Would it be too hard to read the names of the votes if you reoriented the table so that the usernames were in a vertical column? As long as the text of the vote names was wrapped, I think that might be an improvement. Andrew Sheedy (talk) 20:18, 23 August 2016 (UTC)
I agree. There's more voters than votes, so the lower number should be arranged horizontally. —CodeCat 20:42, 23 August 2016 (UTC)
@Andrew Sheedy, CodeCat: Yes check.svg Done. Does it look better now? --Daniel Carrero (talk) 21:48, 23 August 2016 (UTC)
Much! —CodeCat 21:54, 23 August 2016 (UTC)
Definitely—it fits on my screen now! :) Andrew Sheedy (talk) 21:58, 23 August 2016 (UTC)
It should fit on everyone's screens now: "The time allocated for running scripts has expired" doesn't take up much room at all ... Chuck Entz (talk) 09:03, 31 August 2016 (UTC)
@Chuck Entz: You mean you wanted the table to fit your screen and display some actual content? Some people are never happy!
Just kidding. I split the vote into 3 separate tables yesterday: WT:TOV, WT:TOV2, WT:TOV3. The pages seem to be displayed like they should ... most of the time, but sometimes the module error reappears. I used the "hard purge" button to fix it when it happened. You can edit the pages to change the number of votes that appear in each table if you want. --Daniel Carrero (talk) 22:59, 1 September 2016 (UTC)

Voting policy etc.[edit]

Wiktionary:Votes had a lot of important text hidden in collapsible divs. I moved it all to:

--Daniel Carrero (talk) 12:33, 23 August 2016 (UTC)

Vote: Making usex the primary name in the wiki markup[edit]

FYI, I created Wiktionary:Votes/2016-08/Making usex the primary name in the wiki markup. Let us extend the vote as much as discussion requires. --Dan Polansky (talk) 17:32, 23 August 2016 (UTC)

New PIE root categories[edit]

Per comments at WT:RFDO#Template:PIE root, I made {{inh}}, {{der}} and {{bor}} populate categories like Category:Czech terms derived from the PIE root *swep- when the current term is derived from PIE.

This caused a number of redlinked PIE root categories to appear in Special:WantedCategories.

Is that okay with everyone? Is there any problem with these categories, or can they all be created normally? If there is no objection, can someone create all the categories automatically? --Daniel Carrero (talk) 19:07, 23 August 2016 (UTC)

I removed the code that did that, because it was creating all kinds of unwanted categories. Essentially, all PIE terms were being given categories, not just the roots. That said, it's impossible for these templates to determine what is or isn't a root, and besides that, there's tons of etymologies which refer to invalid or badly-formed roots as well. —CodeCat 19:40, 23 August 2016 (UTC)
Maybe it would be better to give the templates a parameter so users can manually make them opt in to categorizing, e.g. |PIErootcat=1 or something like that. —Aɴɢʀ (talk) 09:58, 25 August 2016 (UTC)

Proposal: Request categories with longer names[edit]

This concerns Wiktionary:Votes/2016-07/Request categories. See also the vote talk page for further discussion: Wiktionary talk:Votes/2016-07/Request categories. The vote did not start yet, and is under construction.

Consider all these categories:


Rename all categories, with longer, more accurate names, with proper English grammar/syntax and "requests" in all names. Details are to be discussed below. (Feel free to suggest different names for the categories if you want.)

Proposed names:








Rationale and notes:

  • A more consistent naming style proposed to be used in all request categories.
  • ("requests" vs. "needing") These categories track where something was manually requested, not where it is needed.
  • I attempted to propose names with correct English grammar/syntax. As opposed to "English requests for example sentences", for example. As discussed before, there are no "English requests" anywhere.
  • I'd like to replace "needing attention" and "to be checked" by "review". If we are making a request to do something, you are asking people to review the entries.
  • A minor reason: Some of the proposed category names match the request template. {{rfap}} = "request for audio pronunciation", {{rfe}} = "request for etymology".

--Daniel Carrero (talk) 09:21, 24 August 2016 (UTC)

@Daniel Carrero Thanks for directing me over here. I very much like this proposal. It solves the problem of the current names, which imply that entries in a category are the only ones that need this or that to be added to them or worked on, and the problem of the earlier proposal implying that the requests themselves are in a particular language. It might also be more understandable to newcomers.
I'm curious, though: "requests for review" sounds fine to me as an American English speaker, but is it acceptable British English as well? I recall from Harry Potter that "revising" is the verb denoting what I would call "reviewing". But that was in the context of homework assignments, so I'm not sure. — Eru·tuon 01:24, 26 August 2016 (UTC)
IMO, "revise" does not look as good as "review". Then again, I'm from Brazil, rather than USA or England. Let me know if you would suggest any change.
I started the vote: Wiktionary:Votes/2016-07/Request categories. --Daniel Carrero (talk) 01:21, 28 August 2016 (UTC)

Upcoming 5 million entries milestone[edit]

Should be reached in around one or two months max. Maybe an occasion to celebrate a bit and do some events / communication around the project? A WMF guest blog post with some stats / data visualisations / stories? I'm still surprised about the low profile Wiktionary has, many people have not even heard about it, or confuse it with Wikipedia. Or worse, it gets treated as inferior. At the recent WikiConvention francophone I heard the remark (paraphrased from memory, probably meant as a joke) "Wikipedia is where the real work gets done, Wiktionary is for scrabble players". Time to change this perception! – Jberkel (talk) 10:08, 24 August 2016 (UTC)

The speaker was probably referring to fr.wikt. [;-}] DCDuring TALK 10:33, 24 August 2016 (UTC)
To this day at Wikipedia there are people who consider Wiktionary to be Wikipedia's trashcan. I still see "Transwiki to Wiktionary" at deletion discussions all the time there, even for terms we already have an entry for, and even when our entry is superior to the Wikipedia article up for deletion. —Aɴɢʀ (talk) 15:09, 24 August 2016 (UTC)
I wish that we had a real encyclopedia as a sister project. DCDuring TALK 17:04, 24 August 2016 (UTC)
We are their WT:LOP, and WT:LOP's WT:LOP is Urban Dictionary. Equinox 17:28, 24 August 2016 (UTC)
Deleting WT:LOP would improve our overall quality. --Daniel Carrero (talk) 22:54, 24 August 2016 (UTC)
WT:LOP was formerly the only place to put words that we now call "hot words". It also served as a means of handling good-faith new entries that is less hostile than deletion. I think we are better off looking like the work in progress that we are rather than pretending that we are at all close to being a finished product in whole or in part. DCDuring TALK 00:33, 25 August 2016 (UTC)
Yeah, LOP serves to satisfy/district (some of the) contributors who would otherwise keep trying to add their neologisms to the mainspace, which makes it useful. 'Cause on our end, we can just ignore it... - -sche (discuss) 08:40, 25 August 2016 (UTC)
Something to be mentioned in whatever news release goes out: we have words from about one-third of the world's languages, according to conventional estimates of how many languages are spoken in the world. As of when WT:STATS was updated, we were up to 2535 languages with entries, and I expect we are at least over 2600 by now, if not higher. We include codes for 7960 languages, and given how many languages we've identified as needing codes, which I am steadily adding, I expect that figure will reach 8000 soon (by which time I expect to have passed the one-third mark of 2667 languages with entries, since most languages I am adding codes for I am also adding entries in). - -sche (discuss) 08:40, 25 August 2016 (UTC)
At ~4,845,000 pages, we're pretty close to Wikipedia's pagecount (~5,223,000). I expect that we'll overtake WP as time goes on, because we define at least 200,000 English base words (number of entries minus number of form-of definitions = 368,098, but many are variant spellings, so I conservatively guess 200,000 base words), and have the potential to include that many entries in several thousand languages (assuming poorly-attested languages on one side and highly agglutinative or inflected languages on the other side will make a wash), which is several hundred million entries. (Even just that many entries in the 500 most common of the languages we include would be 100,000,000.) - -sche (discuss) 08:50, 25 August 2016 (UTC)
Interesting, puts things into perspective, but also hints at how much there is still left to do. I'm planning to work on a visualisation which can somehow demonstrate this diversity and connectedness of languages. – Jberkel (talk) 16:19, 30 August 2016 (UTC)

Proposal: make headword templates for some languages automatically categorise phrasal verbs[edit]

The "phrasal verbs" category is pretty well populated for English, but not so much for other languages. There are also, presumably, many more missing from the English category. I therefore propose that

  • The English headword module be modified so that when the page name contains a space, then the phrasal verbs category is automatically added. I'm not sure if phrasal verbs should also be put in the plain "verbs" category.
  • This change also be applied to the modules of other languages, where this is desirable or applicable.
  • This change be applied to other parts of speech, if desired.

It would also be possible to implement this directly in Module:headword, and then it would apply automatically for all languages. However, I don't know if this would be desirable. If everyone else thinks it's fine, we can do that instead. —CodeCat 16:26, 24 August 2016 (UTC)

I don't think that everything that meets the condition presented is a phrasal verb, within the normal meaning of the term, eg, break wind. I don't believe that every entry for a verb followed by a preposition or particle is a phrasal verb, eg, go to hell or go to in that phrase. DCDuring TALK 17:10, 24 August 2016 (UTC)
I wasn't aware of that definition. I thought it just meant any verb that is a phrase (i.e. the SoP meaning of phrasal verb itself). —CodeCat 17:14, 24 August 2016 (UTC)

Proposal: automatically categorise palindromes in Module:headword[edit]

Recently, logic was added to categorise entries if they have unusual characters in them. We can also do other "analysis" of words automatically in the module, including palindromes. Therefore I propose that we add this feature to the module so that the categories don't have to be added manually anymore. —CodeCat 16:31, 24 August 2016 (UTC)

Would it be undesirably computationally expensive to do anagrams this way too? Equinox 17:27, 24 August 2016 (UTC)
Modules have no way to see what entries are in a category, so they are not able to go over each one and see if a term is an anagram of the current term. —CodeCat 17:39, 24 August 2016 (UTC)
  • This is a good idea, and (much unlike anagrams) I can't think of too many language-specific issues, besides the fact that it's not relevant for certain scripts. —Μετάknowledgediscuss/deeds 22:40, 24 August 2016 (UTC)
  • I agree, this is a good idea. --Daniel Carrero (talk) 22:53, 24 August 2016 (UTC)
  • I also like this idea. Will it be implemented such that periphrastic palindromes ("Madam, I'm Adam") would be allowed? You'd have to remove spaces and punctuation, lowercase the string, and pass it through the sort key logic to get reliable results. —JohnC5 00:44, 25 August 2016 (UTC)
What is the minimum length string we will consider a palindrome? 3 characters? DTLHS (talk) 00:44, 25 August 2016 (UTC)
1 character, it seems. Both a and I are in Category:English palindromes. --Daniel Carrero (talk) 00:53, 25 August 2016 (UTC)
I definitely disagree with categorized single characters as "palindromes". DTLHS (talk) 01:01, 25 August 2016 (UTC)
Appendix:English palindromes has palindromes with a minimum length of 2 characters. There are a few two-letters palindromes in the category, too: ee, oo, BB. Can abbreviations, such as AAA, be considered normal palindromes? --Daniel Carrero (talk) 01:09, 25 August 2016 (UTC)
Those may be better in a category Repeated character than palindromes. — Dakdada 13:21, 25 August 2016 (UTC)
Suppose we define "palindrome" for these purposes as words containing multiple different characters. This would effectively exclude all two letter palindromes, which only repeat the same character, and things like WWW and ooo, but would include things like Ana and oro. bd2412 T 14:40, 25 August 2016 (UTC)
That rule should not apply to non-strictly-alphabetic scripts, such as abjads, abugidas, syllabaries, and logographic scripts. In these scripts repetition of the same letter is more than a "repeated character", and thus two-character palindromes and even three-same-character palindromes should be counted as palindromes. --WikiTiki89 14:06, 29 August 2016 (UTC)
I'd like to implement this, but I don't have the sysop rights required to do it. —CodeCat 16:24, 25 August 2016 (UTC)
One other consideration, should we only do this for languages in the Latin script? Or are there scripts we should specifically exclude? DTLHS (talk) 00:00, 26 August 2016 (UTC)
Alphabetic scripts like Greek and Cyrillic also have palindromes. We also have Appendix:Chinese palindromes, and Category:Arabic palindromes. Sanskrit, despite its inherent vowels, also apparently has palindromes. It seems like all scripts can have palindromes. - -sche (discuss) 00:14, 26 August 2016 (UTC)
Is a vowelless Hebrew palindrome still a palindrome when you add the vowels? DTLHS (talk) 00:15, 26 August 2016 (UTC)
Based on the examples in Category:Hebrew palindromes and w:Palindrome, yes. - -sche (discuss) 00:34, 26 August 2016 (UTC)
In Hebrew and Arabic, vowels do not count as part of the written word. In Hebrew, you also have to make sure to count final letters as equivalent to their non-final forms (which, as I see at מום, already works). In Arabic, there is another little bit of trickiness: classically, ي and ى counted as one letter, as did ه and ة, and hamzas did not count as letters. Thus, the following words should be palindromes: يرى, أنا, آباء, همة, يجئ, as well as وضؤ if we had entries for such things. I don't know much of this carries over to Persian or other languages. --WikiTiki89 13:14, 29 August 2016 (UTC)
I implemented Arabic. --WikiTiki89 14:06, 29 August 2016 (UTC)
It seems Korean palindromes are graphical, by hangeul syllables, e.g. 적극적 (jeokgeukjeok) 적 + 극 + 적, not by jamo. 적극적 wouldn't be a palindrome if decomposed into ᄌ ᅥ ᆨ ᄀ ᆨ ᄌ ᅥ ᆨ. --Anatoli T. (обсудить/вклад) 13:26, 29 August 2016 (UTC)
Also Scribunto doesn't have any built-in way to reverse a string by character instead of bytes- if you could write out your implementation I'd be happy to edit the module. DTLHS (talk) 00:11, 26 August 2016 (UTC)
There's a working function now, it's in Module:palindromes but I imagine it will be inserted into Module:headword eventually. There's a test at Module:palindromes/testcases, where I put in the lyrics of "Bob", a song consisting entirely of palindromes, to test it. It probably needs to ignore more types of characters though. —CodeCat 02:17, 26 August 2016 (UTC)
palindrome: "[...] sometimes disregarding punctuation, capitalization and diacritics". It should, probably, remove also diacritics, réifier is listed as a French palindrome. --Vriullop (talk) 08:43, 26 August 2016 (UTC)
@CodeCat I have added this to Module:headword. We should only get false negatives, not false positives, so it should be safe. If anyone wants to add more language specific rules just edit the data table in Module:palindromes. DTLHS (talk) 23:44, 26 August 2016 (UTC)
I don't really agree with passing the language code to the function, as a string. The normal practice in the modules is to pass the language object itself, and let the function fetch the code from it when it needs it. —CodeCat 23:53, 26 August 2016 (UTC)
Also, it's usually a bad practice to include other modules at the top of a module. This is an unconditional load; the module gets loaded by the other module no matter what. But it is often more efficient to load the module in-place, where you use it. That way, the module doesn't get loaded unless it's actually needed. —CodeCat 23:55, 26 August 2016 (UTC)
But we can't know whether the module is needed without including it. DTLHS (talk) 23:56, 26 August 2016 (UTC)
You include it when you need it. Not the entirety of Module:headword needs the module, but only that one piece of code you added in. So you can require it there, on the spot. You don't even need a variable for it, just require("Module:palindromes").is_palindrome(. —CodeCat 23:59, 26 August 2016 (UTC)
Now the module just needs to be categorised. Any idea where it fits? —CodeCat 00:16, 27 August 2016 (UTC)
non-- maybe we shouldn't remove hyphens from the start or the end of the word? I'm not sure this should be categorized as a palindrome. DTLHS (talk) 00:31, 27 August 2016 (UTC)
Fixed now. —CodeCat 00:38, 27 August 2016 (UTC)
I'm no longer able to edit the module. Can this be fixed please? —CodeCat 12:18, 27 August 2016 (UTC)
It is transcluded in the main page, so it has cascading protection. DTLHS (talk) 14:06, 27 August 2016 (UTC)
That cascading protection has repeatedly caused issues like this, with no discernible benefit that can't be obtained better in another way with less collateral damage. (And discussions have shown apathy from admins towards the question.) I've turned it off so that the protection is now only local, and laid out my reasoning in more detail below, with links to some previous discussions. - -sche (discuss) 18:40, 27 August 2016 (UTC)
There are now a bunch of new palindrome categories in Special:WantedCategories if anyone would like to go over them for mistakes. DTLHS (talk) 15:02, 29 August 2016 (UTC)
Now with even more Proto-Malayo-Polynesian palindromes! (nice work) Jberkel (talk) 02:59, 31 August 2016 (UTC)
That does seem odd... @CodeCat, @JohnC5, should we disable it for reconstructions? DTLHS (talk) 03:04, 31 August 2016 (UTC)
That wasn't sarcastic. The only dictionary which would ever have something like it. Although the usefulness of reconstructed palindromes is indeed questionable. Jberkel (talk) 03:10, 31 August 2016 (UTC)
I don't think we should have them for reconstructed terms, whether in reconstructed languages or attested ones. Reconstructions aren't really orthographic forms, but rather representations of the scientific method that reconstructs them. It is, therefore, much more arbitrary. The very same word might be a palindrome to one linguist and not one to another, all depending on which notation they happen to prefer. Attested languages have no such leeway. —CodeCat 13:04, 31 August 2016 (UTC)
While I agree with you that reconstructed terms should not have palindrome categories, your reasoning is a little off. There are many words in attested languages that have alternative spellings, one being a palindrome and the other not, and it is not uncommon for people who are trying to create palindromic sentences to choose a rarer spelling because it happens to fit. My point is that the fact that "the very same word might be a palindrome to one linguist and not one to another" is not the real reason to exclude reconstructions. The real reason is simply that the orthography of reconstructions is artificial and thus meaningless, and there's no more to it than that. --WikiTiki89 13:15, 31 August 2016 (UTC)
User:DTLHS has just disabled this feature for languages whose type is "reconstructed". This brings up a question: Should reconstructed words in an attested language (for example, Vulgar Latin) be categorized as palindromes? I think not, thus the disabling should depend on the namespace rather than the language type. --WikiTiki89 14:55, 31 August 2016 (UTC)
That's fine with me. DTLHS (talk) 14:58, 31 August 2016 (UTC)
What's the best way to check the namespace? DTLHS (talk) 15:08, 31 August 2016 (UTC)
Like this. --WikiTiki89 15:11, 31 August 2016 (UTC)
Actually, I just moved this check to Module:headword. It makes more logical sense to me to have it there. --WikiTiki89 15:15, 31 August 2016 (UTC)

Unicode 9.0[edit]

Can someone please update Appendix:Unicode and subpages?

The appendices cover the characters up until Unicode 8.0. Unicode 9.0 was introduced in June, apparently.

List of new Unicode 9.0 characters: http://www.unicode.org/charts/PDF/Unicode-9.0/

--Daniel Carrero (talk) 16:38, 25 August 2016 (UTC)

Already updated. But Appendix:Unicode/Tangut block is unable to display. It must be done in other approach. --Octahedron80 (talk) 02:26, 27 August 2016 (UTC)
By the way, Unihan database is still not updated the readings of CJK extended blocks. --Octahedron80 (talk) 02:54, 28 August 2016 (UTC)

Trademarks, again[edit]

I'm very much opposed to the idea that Wiktionary has to indicate trade marks in any way. We've recently had an editor who seems to be adding trademark nonsense to dodge bow, a common term not at all connected to any particular trademark, or at the very least a genericised one. They've gone to User talk:JohnC5 and argued that their trademark deserves the same recognition as we give to Nike. What is the policy on this? Why is the trademark indicated for Nike in the first place, and how might this case differ from others? —CodeCat 21:36, 25 August 2016 (UTC)

Per WT:TM, we do not indicate trademark status. The talk page of that page has links to the discussions that led up to it, in which a WMF staff member and an[other?] intellectual property lawyer participated. I have removed the "trademark" context label from Nike. (I wonder if we should upgrade that page from think tank to a higher status.) - -sche (discuss) 21:46, 25 August 2016 (UTC)
Yet, we do explicitly define "trademark" as a recognised label. Even though it's not even a usage context. Should we get rid of it? A possibility we could even consider is to include logic in Module:labels and its data modules to explicitly forbid certain labels. —CodeCat 21:58, 25 August 2016 (UTC)
I had been leaving it in the module because it gets a couple new uses a year, in addition to the existing uses we haven't finished clearing out (update: now done), and having them all categorized is useful. It's tempting to imagine forbidding it, but people would find ways around that, like manually writing (trademark) or using "[as] a trademark", so leaving it (or at least not forbidding it) might be better, in that it makes it easy(er) to find and fix uses. - -sche (discuss) 22:17, 25 August 2016 (UTC)
Leaving it in so that you can take it out... that's wonderfully sneaky. —suzukaze (tc) 03:36, 26 August 2016 (UTC)
Leave it in so you can move it, not remove it. Leave it in so you can find such entries to format them correctly. Trademarks should be indicated elsewhere, IMO as a usage note if not in the etymology. DAVilla 09:57, 3 September 2016 (UTC)
If something was coined as a trademark, that's relevant etymology. If it merely has been trademarked, at some point (possibly the present) in some country, that's not relevant. We previously got some requests from companies which insisted that our entries on terms they trademarked (in one case apparently including a term that predated the trademark) should be indicated as trademarked; we noted that a large number of common words are trademarked (eagle, crest, tide), especially in smaller countries; a WMF representative asked us to formulate a document the WMF legal team could point companies to, and with input from an intellectual property attorney we arrived at this approach. - -sche (discuss) 17:27, 3 September 2016 (UTC)
I'm surprised that so many people dislike the trademarks. At the very least it's good to say that a word like "kleenex" originated as a marketing coinage: to some extent, this explains why it's spelled in that strange way. But per the vote it seems to be something to put in the etymology only. Equinox 13:32, 26 August 2016 (UTC)
We may be unreasonably hostile to them, but it is not unreasonable to avoid asserting or denying the legal status of any particular trademark. I do think that our determination that a given trademark has become generic would often be the same as a court's determination, but IANAL. DCDuring TALK 13:59, 26 August 2016 (UTC)
Indicating that something originated as a trademark / brand name is OK. The issue is with companies that want to indicate that they currently hold a trademark on some word ("off", etc). - -sche (discuss) 17:32, 3 September 2016 (UTC)

Austro-Asiatic and Mon-Khmer[edit]

Austro-Asiatic was traditionally divided into Mon-Khmer and Munda, but more recent classifications have made Mon-Khmer synonymous with Austro-Asiatic. On Wiktionary, we still have mkh (Mon-Khmer) as a subfamily of aav (Austro-Asiatic). Where this becomes a problem is that it prevents Munda terms like दाः from referencing their Proto-Mon-Khmer i.e. Proto-Austro-Asiatic ancestors. How should this be addressed? - -sche (discuss) 21:54, 25 August 2016 (UTC)

In case anyone wants to, the same fallback we did with Uralic vs. Finno-Ugric should be possible: make Proto-Mon-Khmer an etymology-only language, so that all mkh-pro mentions link to the corresponding aav-pro reconstruction page. In other words, there will be no separate entries for mkh-pro, but any Austroasiatic words not attested in Munda can still be referred to as "Mon-Khmer".
In this particular case though, mkh-pro *ɗaak should simply be moved to aav-pro *ɗaak, as it appears to indeed have Munda descendants. --Tropylium (talk) 20:23, 26 August 2016 (UTC)

Voting in "borrowing, borrowed, loan, loanword → bor"[edit]

About this vote: Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor. I am the vote creator and one of the supporters.

The vote is scheduled to end in a few days (on August 30). If the vote ended right now, it would pass. Current results: 11-5-2 (68.75% supporting votes, 31.25% opposing votes). But it is just a little above the two-thirds majority required to pass, which would be 66.6% supporting votes.

I think it would be a good idea for more people to cast their votes here if they didn't already. If more people voted, the result would hopefully show a clearer consensus. It is not clear if it's going to pass or fail if a few more people voted. --Daniel Carrero (talk) 03:28, 26 August 2016 (UTC)

Vote: Enabling different kinds of romanization in different locations[edit]

FYI, I created Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations. Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 08:32, 26 August 2016 (UTC)

Dan Polansky for admin[edit]

See Wiktionary:Votes/sy-2016-08/User:Dan Polansky for admin. Thank you. --Daniel Carrero (talk) 12:52, 26 August 2016 (UTC)

What, again? Renard Migrant (talk) 12:59, 26 August 2016 (UTC)
@Renard Migrant: Dan has rejected sysop nominations in the past. Now he accepted this nomination. --Daniel Carrero (talk) 23:25, 26 August 2016 (UTC)

Request/proposal: Show preferred languages in translation tables as a translation, not in the header[edit]

The tool that shows translations of certain selected languages in the header of the translation table is useful, but quite limited. You can only select a small number of languages before the header gets too big. I therefore propose the following change to how it's presented. Instead of showing it in the header, it's shown in a minimal version of the translation table itself. This table would have two columns and display translations the same way they appear in the full table. When the table is expanded, the minimal table is replaced with the full table. This is similar to how some recent inflection tables work, like the Dutch verb tables (see zijn for a working example).

This change requires some good knowledge of JavaScript, which I don't really have. So please volunteer if you're able to make this change. —CodeCat 19:08, 26 August 2016 (UTC)

Am I correct in assuming that:
  1. all the JS that is common slows down entry downloading for everyone and
  2. this would be in common JS?
If so, this and other items that are not likely to be used by most users, including the non-contributing anons, should not be in common. Instead they can be added to the user JS files. DCDuring TALK 23:57, 26 August 2016 (UTC)
What are you talking about? This is a feature we already have in our JS. I'm just asking to make it look nicer. —CodeCat 00:01, 27 August 2016 (UTC)
So it's already contributing to sluggish downloading, non-appearance of icons, etc? DCDuring TALK 00:33, 27 August 2016 (UTC)
That's not relevant to my request. —CodeCat 00:36, 27 August 2016 (UTC)

User:Babel AutoCreate[edit]

This user has created several questionable Babel categories: Category:User roa-Tara, Category:User zh-Hans, Category:User zh-Hant. DTLHS (talk) 01:06, 27 August 2016 (UTC)

My understanding is that it is not a user, or even a bot, it's a WMF script that is part of the cross-site Babel extension (the Babel you get if you use '#' in front of the Babel codes, whereas if you don't use '#' you just get our local templates). It merely takes the form of an account, presumably to make its page-creations 'neater'. (I notice that it has, hilariously, received a mass-message warning that it would be renamed.) As I wrote when I unblocked it after someone blocked it, it "mostly adds valid cats, [and] when invalid cats are added, it's a sign actual people have used those Babel codes, and the solution is to educate the people and protect the cat against recreation", as I have now done to the zh-Hans, zh-Hant categories. If we had one of our bots create every possible level of every possible Babel language — Category:User aaa-4 for Ghotuo, etc — then I guess we would no longer need the script to create things on-demand for us, and could block it. - -sche (discuss) 02:58, 27 August 2016 (UTC)
The fact that it's categorizing them in Category:language (when it doesn't know what the correct language is?) is concerning, maybe we should warn the operator about that. DTLHS (talk) 03:00, 27 August 2016 (UTC)

"Template editor" user group[edit]

At Wikipedia, they have a user group called "template editors" who can edit protected, high-visibility templates and modules. More information on how they handle it can be found here. This allows for the existence of users whom we don't need to trust as admins (so no blocking or deleting) but can use their technical skills to help the project. They could be confirmed much like rollbackers are, at WT:WL (or alternatively through a vote like sysops). Is there interest in having this user group here? —Μετάknowledgediscuss/deeds 06:16, 27 August 2016 (UTC)

I would be interested if the group allowed editing gadgets and all of the protected high profile non template pages (like mediawiki common.css common.js).--Dixtosa (talk) 09:50, 27 August 2016 (UTC)
This sounds like a good idea to me. Andrew Sheedy (talk) 01:28, 28 August 2016 (UTC)
This sounds like a great idea to me. We definitely need more permissions groups. Benwing2 (talk) 02:01, 5 September 2016 (UTC)

Poll: Description section[edit]

This concerns Wiktionary:Votes/2016-08/Description. The vote proposes adding a "Description" section with a visual description of symbols. The vote did not start yet.

See this short entry example, with a description of the symbol "🔇".


A speaker with a stroke. Sometimes, shown as a speaker with a prohibition sign (🚫).


# [[mute]]

Question: What should be the name of the description section, in your opinion?

This is a poll with no policy value.

Use "Description"[edit]

  1. Symbol support vote.svg Support --Daniel Carrero (talk) 15:34, 27 August 2016 (UTC)
    If someone said to me: "Describe the Venus symbol." (), I would probably say its shape.
    If someone said to me: "Describe the Eszett." (ß), I would probably say its shape, too.
    Correct me if I'm wrong: other things such as pronunciations are also "descriptive" in a sense, but visual descriptions are the first thing that comes to mind. --Daniel Carrero (talk) 15:36, 27 August 2016 (UTC)
  2. Symbol support vote.svg Support Korn [kʰũːɘ̃n] (talk) 15:57, 27 August 2016 (UTC)
  3. Symbol support vote.svg Support, per above. Andrew Sheedy (talk) 01:29, 28 August 2016 (UTC)

Use "Shape"[edit]

  1. Symbol support vote.svg Support --Daniel Carrero (talk) 15:38, 27 August 2016 (UTC)
    I prefer "Description". Shape is the second best, in my opinion. --Daniel Carrero (talk) 02:43, 28 August 2016 (UTC)
  2. Taking Equinox's good point about "description" (which I otherwise like) into account, this is possibly the best name, or perhaps "appearance" is. - -sche (discuss) 04:40, 28 August 2016 (UTC)

Use "Glyph"[edit]

  1. Symbol oppose vote.svg Oppose because this sounds like a Part of Speech section; compare "Letter", "Symbol". - -sche (discuss) 04:40, 28 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose per -sche. --Daniel Carrero (talk) 23:41, 29 August 2016 (UTC)

Use "Appearance"[edit]

  1. Symbol support vote.svg Support, but I prefer "Description". --Daniel Carrero (talk) 17:01, 28 August 2016 (UTC)

Use "Visual description"[edit]

  1. Symbol support vote.svg Support This will work I think. Think about with topic 'shape', and someone put "a triangle" ... very decriptive. lol. Octahedron80 (talk) 16:04, 27 August 2016 (UTC)
    Personally, I don't like "Visual description" because it is too long for my taste.
    Maybe the actual description of that character should be a little longer, like this: "A triangle pointing upwards." So it would not be confused with other triangles in Appendix:Unicode/Geometric Shapes. It is officially called "WHITE UP-POINTING TRIANGLE", but I don't think we need to mention the word "white". It seems to have a jargon-y meaning in Unicode: here, "white" means "symbol contour, not filled with the black color".
    Incidentally, delta (Δ) can also be described as "A triangle pointing upwards." --Daniel Carrero (talk) 16:30, 27 August 2016 (UTC)
    You are correct about Unicode's jargony use of "black" and "white". However, if we need to describe shapes, we can always say "solid", "filled", "outline", etc... Equinox 02:00, 28 August 2016 (UTC)
    Point taken. For ("WHITE UP-POINTING TRIANGLE"), we may want to choose between "A triangle pointing upwards." or "The outline of a triangle pointing upwards." --Daniel Carrero (talk) 02:14, 28 August 2016 (UTC)
    Some characters are described in color, usually emoji. For example, 💙 blue heart, 💚 green heart, 💛 yellow heart, 💜 purple heart, 🍎 red apple, 🍏 green apple, etc. Is there other way to explain these? --Octahedron80 (talk) 02:17, 28 August 2016 (UTC)
    Yeah, this is a very strange (and very new!) thing in Unicode. I can only speak for my own Windows/Linux rendering (I think Apple is more "colourful") but I see them as different shades and stripes. Whether Unicode should concern itself with colours at all is very arguable, but those arguments have been had, elsewhere, and one of the current big issues seems to be skin colour. (I can see why a set of purely white emoji would be a problem, but I've mostly seen smileys as yellow things. Oh well. Not going to touch that with a barge-pole.) I see this as a question of "our friend just jumped off a cliff, should we follow?". Equinox 02:27, 28 August 2016 (UTC)
    The hearts are all red on my screen, and both apples are blue. I assume they are supposed to actually match the colours used to describe them? :P Andrew Sheedy (talk) 02:56, 28 August 2016 (UTC)
    Yeah, you're seeing them essentially in normal monochrome, except that on Wiktionary a linked text is blue (if the target page exists) or red (if it doesn't). In some other contexts, especially phone chat, some clients render them in actual colour, like those weird bright yellow pictures of smileys that you get when you type :). Equinox 02:59, 28 August 2016 (UTC)
    From the unicode-faq: "Some of the characters from the core emoji sets have names that include a color term, for example, BLUE HEART or ORANGE BOOK. These color terms in the names do not imply any requirement about how a character must be presented, they are intended only to help identify the corresponding character in the core emoji sets." – Jberkel (talk) 23:30, 29 August 2016 (UTC)
    Re: "Is there other way to explain these?"
    I believe "A blue heart." is enough for the blue heart emoji. Same for others. --Daniel Carrero (talk) 03:02, 28 August 2016 (UTC)
    Kinda off-topic: All the different-colored hearts represent the same concept: a heart. For that reason, I would support redirecting "blue heart", "orange heart", etc. to a single main heart entry and letting the single heart entry explain that Unicode variations exist. We are currently using ("BLACK HEART SUIT") for all heart-related senses: love (generic); love (English verb); hearts suit; hit points; etc. --Daniel Carrero (talk) 23:40, 29 August 2016 (UTC)
  1. Symbol oppose vote.svg Oppose. As I said above, "Visual description" is too long for my taste. --Daniel Carrero (talk) 17:01, 28 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose. This name is too long, IMO. - -sche (discuss) 04:40, 28 August 2016 (UTC)

Use "Etymology"[edit]

  1. Symbol oppose vote.svg Oppose. I've been placing shape descriptions in the "Etymology" section because it is an allowable section and it's better doing that than leaving descriptions as definitions. But visual descriptions are not etymologies. I'd rather use an actual visual description section instead of the "Etymology" section. --Daniel Carrero (talk) 23:43, 29 August 2016 (UTC)

Use "Usage notes"[edit]

  1. Symbol oppose vote.svg Oppose. These are not actual usage notes. --Daniel Carrero (talk) 00:01, 30 August 2016 (UTC)

Use other name[edit]


  1. Symbol oppose vote.svg Oppose having this section, but if we do have it, I would favour "Shape". The entire entry is a description: a description of the meaning, a description of the sound, etc. Equinox 15:40, 27 August 2016 (UTC)
    @Equinox I added a short visual description in the Etymology section of these entries:
    Do you think the entry should have that information, or do you think we should remove it? I proposed using a "Description" section for the visual description. If we don't use a "Description" section (or "Shape" section, "Appearance" section, etc.), do you think we can use the Etymology section to place the visual description of these symbols? --Daniel Carrero (talk) 15:36, 29 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose The only potential purpose I see for this is to inform blind users of a what the character looks like, which is not something I think we need to be concerned about. If the our purpose is the more obvious purpose of helping users who do not have the font support to see the character, then a much better solution would be to just include an image of the character (and all its variants, if applicable), which is something we already do to sometimes. --WikiTiki89 11:12, 29 August 2016 (UTC)
    @Wikitiki89 I added a short visual description in the Etymology section of these entries:
    Do you think the entry should have that information, or do you think we should remove it? I proposed using a "Description" section for the visual description. If we don't use a "Description" section (or "Shape" section, "Appearance" section, etc.), do you think we can use the Etymology section to place the visual description of these symbols? --Daniel Carrero (talk) 15:36, 29 August 2016 (UTC)
    The Etymology section should not simply be a place to put a visual description of the symbol, but if the description fits as part of the etymology, that would be fine. --WikiTiki89 15:42, 29 August 2016 (UTC)
    @Wikitiki89 In my opinion, we should have these shape descriptions somewhere in the entry. For now, I've been using the Etymology to place shape descriptions. See this diff of the hourglass symbol for an instance where I moved a simple, presumably unattestable "hourglass" definition into the etymology and added an actual sense.
    However, you said: "The Etymology section should not simply be a place to put a visual description of the symbol ..." . In your opinion, should we remove all these shape descriptions from the entries? --Daniel Carrero (talk) 23:55, 29 August 2016 (UTC)
  3. Symbol oppose vote.svg Oppose I'm also wondering what kind of value Wiktionary can get (or add) by describing emojis in detail. I find the definitions you've added to the entries a lot more helpful than the description itself. The meaning / gloss should be relatively stable, the images can change (maybe you have heard about the recent discussion around 🔫 ‎(pistol)). emojipedia now lists more than 10 different character sets, not counting OS sub-variants. If really needed, why not add a (short!) visual description after the definition, e.g.
    🔗: (Internet) Indicates a hyperlink (usually displayed as chain links).
    Jberkel (talk) 01:14, 30 August 2016 (UTC)
    @Jberkel: I tried to add useful definitions for the symbols. So, concerning the definitions, I feel flattered by your comment and for that I thank you: "I find the definitions you've added to the entries a lot more helpful than the description itself."
    Now let's get down to business:
    About "describing emojis in detail". I'm not actually interested in describing emojis in detail. My idea is specifically having short descriptions. I expect it to be apparent from the descriptions I have already wrote. Basically I just use the Unicode character name as the "Description" when appropriate.
    • = "A hourglass."
    • 🔫 = "A pistol."
    The following is an exception because the Unicode name would be just "link symbol", which is not that descriptive.
    Of course the pistol has a dozen of variants! -- Emojipedia page: http://emojipedia.org/pistol/ -- Like I said about "lollipop" somewhere else, I don't want to describe every nook and cranny of the pistol. Just "A pistol." suffices. It conveys the whole idea. Actually, we might want to say in the "Description" section of 🔫: "A pistol pointing left.", if applicable.
    I oppose this:
    🔗: (Internet) Indicates a hyperlink (usually displayed as chain links).
    My reasons are:
    • Every symbol can have a short description and that's why I proposed having a whole section for it. (there may be important exceptions: I would probably oppose visual descriptions for Han characters until further discussion)
    • If a given symbol has, say, 6 definitions, where would we put the visual description? Only in the first sense? In all senses?
    • See this diff of the hourglass symbol for an instance where I moved a simple, presumably unattestable "hourglass" definition into the etymology and added an actual sense. (I used the "Etymology" but I'd prefer using a "Description" section.) In my opinion, definitions should be for actual semantics. There are too many symbol entries where the definition is a visual description instead of an actual, semantic definition. If we disallow visual descriptions in definitions completely, it should become easier to clean up symbol entries by moving visual descriptions up to the "Description" section.
    When you say: "If really needed, why not add a (short!) visual description after the definition," it is a different way of entertaining the possibility of having a visual description somewhere. My idea is to use a "Description" section specifically. --Daniel Carrero (talk) 02:00, 30 August 2016 (UTC)
    Regarding "if really needed": my point was that most symbols probably don't need it since they have a straightforward mapping from graphic depiction to sense, as in 🔫 ‎(pistol) and ‎(hourglass). For , the description could be its primary definition:
    1. (literally) hourglass
    2. (by extension) time
    3. (by extension, GUI) Indicates that the current application is busy performing an operation.
    The "literal" (not sure if that's a good term. "visual"?) sense would be the simplified visual description. If the pistol is pointing left or right is an implementation detail and should be left out. Symbols which denote abstract concepts such as 🔗 benefit most from the visual description, but I suspect that this is a small percentage of all symbols. Also I'm not sure if there even is a shared understanding of Unicode "meanings", since they are highly dependent on context and evolve rapidly (something we could help to document though). – Jberkel (talk) 08:07, 30 August 2016 (UTC)
    I agree with your point: "If the pistol is pointing left or right is an implementation detail and should be left out."
    I have something to say about this: "Symbols which denote abstract concepts such as 🔗 benefit most from the visual description, but I suspect that this is a small percentage of all symbols." I don't know if it's a small percentage (Unicode has more than 128,000 characters, so 1% of characters would be a high number) but if you want, I'm pretty sure I should be able to list 100 characters where the Unicode codepoint name is not a visual description, like "LINK SYMBOL" for 🔗.
    I don't like very much the idea of having this sense for :
    1. (literally) hourglass
    It may not be attestable. Is there any permanently recorded media including a sentence with denoting the actual object "hourglass"? I wonder if someone would write, in a book or Usenet: There is a hole in my , so the sand is spilling out! Mind you, we are not allowed to create the entry with only 1 sense: "A hourglass.", so why would it be acceptable to create with multiple senses, the first one being "A hourglass."? If the visual description is so special that we consider allowing a whole unattested sense for it, then in my opinion we may as well have the separate "Description" section for the visual description. --Daniel Carrero (talk) 15:01, 30 August 2016 (UTC)
    Just wanna point out that on my iPhone, the link symbol consists of two linked chains, oriented diagonally. By the way, in the etymology section you can write something along the lines of "The two or three chain links that make up the symbol symbolize the concept of connection." --WikiTiki89 15:09, 30 August 2016 (UTC)
    Thank you! I edited 🔗. --Daniel Carrero (talk) 15:46, 30 August 2016 (UTC)
    I like your counterexample :) probably falls into the similar category as 💾, 💿, ☎️ or 📠: the objects are no longer in (widespread) use. You're right, if the literal sense cannot be attested it probably should not be included. A search for on twitter shows that it mostly gets used to represent time or the concept of time running out (countdown). – Jberkel (talk) 16:56, 30 August 2016 (UTC)



I'm pinging everyone else who participated in the vote talk page or the previous BP discussion: @Andrew Sheedy, Koavf, Jberkel, -sche, Octahedron80. --Daniel Carrero (talk) 15:47, 27 August 2016 (UTC)

Forgive my intrusion, but what do you really mean by "description"? Isn't a definition supposed to describe? Also, what's better about "description" than "usage notes"? A description could mean lots of things. I have this inner feeling that it isn't appropriate, but I'm not voting yet until I better understand what "description" flatly means. Philmonte101 (talk) 23:28, 27 August 2016 (UTC)
@Philmonte101: "Description" is the shape of the symbol. In the example given, "🔇" is a speaker with a stroke. So, "a speaker with a stroke" would be the description. Note that it would be incorrect to define the character as "a speaker with a stroke". The actual meaning of the character is "mute", or "mute symbol". For more, see Wiktionary:Votes/2016-08/Description. --Daniel Carrero (talk) 23:44, 27 August 2016 (UTC)
All of the above have problems, as far as I'm concerned. This isn't really a description of anything, it's an explanation of what the glyph is supposed to represent. I think something along the lines of "Glyph representation" would be better, but that's still not quite right. Chuck Entz (talk) 02:41, 28 August 2016 (UTC)
On second thought, what about just "Glyph"? Chuck Entz (talk) 02:44, 28 August 2016 (UTC)
Maybe I'd support "Glyph". Let me think it over. For now, I added a Support "Glyph" option in the poll, in case you and/or other people want to use it. --Daniel Carrero (talk) 03:08, 28 August 2016 (UTC)

As we are explicitly warned when creating an entry, "mere Unicode code point name does not constitute a definition". What are we aiming for, here? A dictionary explains what a word means. The OED famously added the smiley face a year or two ago (how do they alphabetize it?) but we should not slavishly copy every little Unicode symbol just because we can represent it. CFI applies to them all. If this idea of describing shapes is a sneaky way to legitimize symbols that otherwise have no lexical value — or if it's a way to satisfy some mechanistic desire to create entries, without regard to whether anyone will ever get the slightest use out of them — then admit it. I might be getting old but I fail to see the use. We don't have to have an entry for every single Unicode code-point. Equinox 02:35, 28 August 2016 (UTC)

I'll repeat something I said in the vote talk page: "I used the Etymology section for [a few descriptions]. Do you support using the Etymology section for that information? Would you change something? In the vote, I argued that these are actually descriptions, not etymologies, so at least if we are using the Etymology section to keep a description, then we might as well use a Description section." We might also avoid having entries defined using the Unicode name of the character.
I also said in the vote page:
  • This is not part of the current vote, but rather something that can be discussed eventually: if we have the Description section, we can either: 1) keep striving to have only the attestable symbols, deleting all other symbols; or 2) if people want, we can try having a large Unicode database, with unattestable entries that have the Description section properly filled with a textual description, and a single definition along the lines of: "# Symbol not attested. This entry merely describes the Unicode character."
This is not a "sneaky way to legitimize symbols". If we wanted to have a large Unicode database, with many unattested symbols, then the "Description" section would help by keeping descriptions away from the part of the entry reserved for actual definitions. But I'm leaning towards keeping the status quo and only allowing attestable symbols. My idea is to have the "Description" for attestable entries of symbols. --Daniel Carrero (talk) 04:17, 28 August 2016 (UTC)
At some point in the future, I would like to suggest redirecting some "Roman numerals" entries like this: II. Also I'd redirect fullwidth letters, halfwidth katakana, etc. into the normal characters. I'd also redirect vertical-writing parentheses and parentheses pieces to the entries for the normal parentheses. In the past, I created Wiktionary:Votes/2011-06/Redirecting combining characters to redirect some combining characters. (it passed.) I also recently created a discussion about redirecting the components of some matched pairs like: and ⌈ ⌉. (Not to mention the discussions about completely nuking entries for letters and replacing them by appendices.) That is, sometimes, I like to propose merging some entries of symbols that seem to be redundant to each other, which is the opposite of actively looking for a way to create unattested symbols.
I like to try and hunt for meanings given to random symbols, too, though it's unclear if I can find 3 citations for those. I found one citation for each of those: , and .
Maybe in the future I'll find a reason to propose creating unattested symbols, I don't know. I happen to like the characters for dominoes, playing cards, mahjong and box drawing, and maybe they are really hard to attest, (I didn't check.) so in my heart there's some temptation to try and find a reason to create these symbols. But I acknowledge what you said: if this would be just a "mechanistic desire to create entries, without regard to whether anyone will ever get the slightest use out of them", it'd be foolish to create entries like these. (and who knows, they can be attestable if I search hard enough) --Daniel Carrero (talk) 05:23, 28 August 2016 (UTC)

(the discussion below was moved from the Support "Shape" section)

I prefer "Description" above, and consider "Shape" a good 2nd place. This word defines the concept in a straightforward way. It feels a little less professional than "Description", though. But it could be just me. --Daniel Carrero (talk) 15:38, 27 August 2016 (UTC)
Whose profession? Particularly since you originally wanted to justify this as "possibly useful to blind people" (paraphrase), I'd feel you ought to be more sensitive to people who are physically unable to use visual descriptions. Equinox 01:59, 28 August 2016 (UTC)
I apologize for my statement that you quoted: "possibly useful to blind people". I did not mean to be insensitive towards people who are physically unable to use visual descriptions. My statement was temporarily present in the vote rationale, and then I removed it a few days ago, when you commented about it in the previous BP discussion.
Re: "Whose profession?" Correct me if I'm wrong: In my opinion, "shape" sounds like a more basic English word and "description" sounds like a more normal-ish English word. I admitted it could be just me. I should have explained my opinion better; "It feels a little less professional" was poor wording on my part. --Daniel Carrero (talk) 02:19, 28 August 2016 (UTC)
Okay I don't want to massively derail this section and we can carry it to another page if it matters, but you are saying that "shape" sounds somehow stupid, or kiddy, or not adult enough? and "description" sounds more like what an adult would say? My main criticism is that they have totally different meanings... Equinox 02:23, 28 August 2016 (UTC)
Yes. In my opinion, "shape" sounds kiddy, not adult enough, and "description" sounds more like an adult would say. --Daniel Carrero (talk) 02:41, 28 August 2016 (UTC)
But they have totally different meanings!! To "describe" a word is to give its definition, its sound, its history, all kinds of things. Imagine if the headers were like "Description of its sound", "Description of its shape" (your new section), "Description of its meaning"... don't you see why that header is too broad? Equinox 02:43, 28 August 2016 (UTC)
If I give a "description" of a car to the police, that would involve its number-plate (licence/registration), its colour, its size, everything. It doesn't just mean shape. They don't say "give me a description" and I say "...oh, kind of hemispherical, with four wheels". Equinox 02:44, 28 August 2016 (UTC)
I'm beginning to feel inclined towards supporting the "Glyph" suggested above. Do you agree with me that "Shape" sounds childish? Wouldn't it be bad to use a childish section title?
I stand by my argument about ♀ and ß, that I said in #Support "Description". As I said elsewhere, I don't think that people are going to try adding definitions, etymologies, etc. in the "Description" section, or looking for definitions, etymologies, etc. in the "Description" section. --Daniel Carrero (talk) 03:20, 28 August 2016 (UTC)
To illustrate your point, you gave as examples: "Description of its sound", "Description of its shape", "Description of its meaning". Imagine if the entries actually had these exact same sections, without "description of". The entry would have: "Sound" (instead of Pronunciation), "Shape", and "Meaning" (for definitions). These all look like kiddy English to me, and they would make me feel uncomfortable to some extent, like "Shape" already does. (Again, it could be just me.)
-sche gave a good point against "Glyph", (see the #Support "Glyph" section) and I tend to agree with the point given. Maybe "Appearance" would be a good name. If all else fails, I stand by "Description" or I may agree with you on "Shape" if it's the best name available. --Daniel Carrero (talk) 05:38, 28 August 2016 (UTC)

Cascading protection of the Main Page[edit]

In the past, every language code was in its own template, every label was its own template, and languages had separate headword templates. This was inefficient, but also meant that cascading-protecting one page didn't usurp the local protection levels of and lock every single one of our thousands of labels and prevent the addition of new labels, lock each of the various things that feed into headword templates and prevent bug-fixing, etc. As we have centralized content into modules that are now called by most headword-line templates, by any use of a label template, etc, the cascading protection of the Main Page has become a problem — whenever a WOTD or FWOTD on the main page includes a label, for instance, it overrules the lower level of protection that is applied to e.g. Module:labels/data, and prevents longtime trusted users including e.g. fr.Wikt admin User:JackPotte from adding labels (see talk); it also apparently prevented users from editing peripheral things that feed into headword templates, per the most recent comments in this thread; etc.
The cascading protection option, which applies the protection level which the main page has to all pages it transcludes, is excessive. The Word-of-the-Day templates all seem to already be independently protected, so no anon can vandalize them, and the WOTD mainspace entries themselves aren't covered by the cascading protection, so it's of no benefit to them. If FWOTDs (for example) need to be protected, protect them specifically.
Last time this was briefly discussed by a few of us, someone suggested simply changing labels in WOTDs to wikicode as a workaround to avoid having the cascading protection lock the label module, but you can see that no-one has remembered to do that, since the main page currently uses {{lb|en|humorous}}.
In light of the cascading-protection-option's clear, recurring negative effects, and its apparent lack of positive effects that can't be obtained better in other ways with less collateral damage, ... and the general apathy the last discussion exhibited towards the narrow-seeming issue of whether autoconfirmed users should be able to edit things where the local protection level allows them but is usurped by the cascading protection, ... I have taken the initiative and protected the main page against non-admin edits or moves while turning off the cascading option. - -sche (discuss) 18:36, 27 August 2016 (UTC)

Cascading is a good thing. If I understand this correctly, the only problem is labels being used in WOTDs, and @Smuconlaw simply needs to avoid doing that. We already make sure not to put labels in FWOTDs. —Μετάknowledgediscuss/deeds 23:03, 27 August 2016 (UTC)
Er, why is it a good thing? In the past, cascading protection was removed several times without negative effect AFAICT, and simply reinstated by Liliana (who is now globally blocked), when she finally noticed its absence, only out of habit / "because it used to have this [cascading option turned on]". How the cascade usurps local protection settings, not just of the label modules but apparently also headword-related modules like the palindrome one discussed above, is clear. What is the benefit of it? The WOTDs are already protected. If there were vandalism against the current day's FWOTD template (the only one protected by cascading, while all the rest are apparently free to be vandalized), we could protect them specifically, presumably at the same autoconfirmed level as the WOTDs. Or perhaps, instead of the main page itself transcluding {{#ifexist:Wiktionary:Foreign Word of the Day/{{CURRENTYEAR}}/..., there could be a template with cascading autoconfirmed (not admin-only) protection applied to it which transcluded that code, and then that template could be transcluded by the main page. - -sche (discuss) 00:37, 28 August 2016 (UTC)
@-sche: Fair point. If you implement what you described in your last sentence, I'll be happy. —Μετάknowledgediscuss/deeds 01:49, 28 August 2016 (UTC)
Just wanted to say that there is nothing on "Wiktionary:Word of the day/Nominations" which says to avoid using {{l}} (or other templates) in WOTDs. If this was an important point, it should have been documented. — SMUconlaw (talk) 06:44, 28 August 2016 (UTC)
@Smuconlaw: To clarify, this wasn't an important point when WOTD was set up and the documentation drafted; it only became important recently when we centralized labels etc. into modules, and it only became noticed more recently than that. Also, I don't think {{l}} is a problem(?), it's {{lb}}s, the use of which cause the main page to overrule the protection settings of the label modules. - -sche (discuss) 01:20, 16 September 2016 (UTC)
OK, thanks. Anyway, I think the issue has been resolved. — SMUconlaw (talk) 12:22, 16 September 2016 (UTC)
Thanks for doing that! Feels weird to have to ask around in order to make changes. Regarding the WOTD, if we have to change the markup or can't use certain templates then something is clearly wrong. Hope we can find a better solution. The radical but safe approach would be to use a static copy during the period the word is featured. – Jberkel (talk) 08:52, 30 August 2016 (UTC)


Discussion moved to WT:RFDO.

Renaming Scanian to Old Scanian[edit]

Discussion moved from Wiktionary talk:List of languages.

"Scanian" usually refers to the Swedish dialect spoken today in Scania, while the entries in Category:Scanian language belong to an archaic language similar to Old Danish and Old Swedish. After a quick googling, "Old Scanian" seems to already be in use to distinguish the archaic language from the contemporary Swedish dialect. Smiddle (talk) 08:16, 28 August 2016 (UTC)

If that situation is the case, I don't think renaming would help. Of "Scanian" (Swedish) and the Scanian language are two different things, then renaming Scanian language to Old Scanian simply puts the confusion one step back in the chain and makes it sound like Old Scanian is the ancestor of Scanian Swedish - which should be Old Swedish. Korn [kʰũːɘ̃n] (talk) 09:41, 28 August 2016 (UTC)

Diacritic stripping in Breton[edit]

@Embryomystic and anyone else who knows Breton: we currently strip the following diacritics in Breton:

from = {"[âàä]", "[êèë]", "[îìï]", "[ôòö]", "[ûùü]", CIRC, GRAVE, DIAER},
to   = {"a",     "e",     "i",     "o",     "u"}},

But according to w:Breton_language#Alphabet, at least the following are actually used: â, ê, î, ô, û, ù, ü, ñ. Shouldn't we therefore be keeping all 8 of those, instead of only ñ? Breton Wikipedia uses at least ê and ù in its articles. (Here's a sentence from br:Backgammon: Dre m'eo berr amzer c'hoari pep den, hag a-benn digreskiñ lodenn ar chañs, e vez c'hoariet a-grogadoù alies, ha trec'h e vez disklêriet an den en deus dastumet ar muiañ a boentoù dreist 3 e-lec'h gortoz ma vije bet lamet an holl jedoueroù diouzh un tu.) So it looks like these diacritics are used in normal writing, not just in pedagogical texts and reference works. —Aɴɢʀ (talk) 14:15, 29 August 2016 (UTC)

In fact, I just noticed that we do use "ù" at least in article titles, e.g. anvioù, which is the plural of anv. But "anvioù" in the headline of anv points to "anviou" instead, so the link is broken. —Aɴɢʀ (talk) 14:19, 29 August 2016 (UTC)
@JohnC5 is the one who made the change to the module, about a month ago, so I'm pinging him too. —Aɴɢʀ (talk) 14:21, 29 August 2016 (UTC)
I'm honestly a little baffled why I made this change. Feel free to change it according to what is correct, and sorry for the inconvenience. —JohnC5 14:42, 29 August 2016 (UTC)
OK, I've unstripped the diacritics. —Aɴɢʀ (talk) 14:44, 29 August 2016 (UTC)

Proposal: Creating entries for Morse code characters[edit]

@Octahedron80 asked me here me if they could create entries for Morse code patterns. I support the idea. This sounds something natural to do, like we have for Braille A, for Braille B, etc. I'm opening this discussion to see if it's okay with other people.

Idea for characters: Use - ("HYPHEN-MINUS") for dash and · ("MIDDLE DOT") for dot. (there are many variants of dashes and dots in Unicode; these two feel the most "generic" to me, in a sense)

Please don't create Morse code entries until the discussion is over. Thank you! --Daniel Carrero (talk) 04:55, 30 August 2016 (UTC)

I also heard that there is Japanese version of Morse code either. As well as punctuations. --Octahedron80 (talk) 04:58, 30 August 2016 (UTC)
Oppose. Unlike Braille, morse code isn't even meant to be written. There are all sorts of encodings for letters, but if they aren't used to actually write the language, there is no point in including them. In my mind, this would be as silly as including entries for written-out Unicode codepoints (U+0020 = space, U+03B1 = α, etc.). --WikiTiki89 11:41, 30 August 2016 (UTC)
Well, we do have this, which certainly isn't meant to be written. I can see the morse proposal fit into our project. And should the debate come to the point that it is decided that we exclude all those representations of language which do not constitute a written form, sign language has to go too (and maybe all non-word symbols in general). I'm fine with either. Korn [kʰũːɘ̃n] (talk) 11:51, 30 August 2016 (UTC)
But sign language isn't merely an encoding of existing letters, it's a whole language in and of itself and for that reason it's worth documenting. The entry titles for sign language are not ideal, but there is no better way around that. For morse code, on the other hand, we already have entries for A, B, and C, so why do we need strangely encoded versions of them at ·-, -···, and -·-·? In any case, if we do include these, I think we should use periods for the dots, because that's the way it's traditionally done in typewriter and computer settings (.-, -..., and -.-.). More proper typesetting would probably use something like bullets () and n-dashes () anyway (or perhaps something better that I haven't found in Unicode), so I don't like the combination of mid-dots and hyphens. --WikiTiki89 11:54, 30 August 2016 (UTC)
Morse code isn't strictly just an encoding- it's also a sort of script. Just as Cyrillic and Latin Serbo-Croatian represent the same sounds with different scripts that aren't mutually intelligible, so does Morse code represent the same sounds as the Latin script does. It also has a very limited number of non-alphabetic {{w:Prosigns for Morse code|prosigns}} and a very rudimentary set of grammar-like conventions. It's not a language like the various sign languages are, but it's not just an encoding like ASCII or ANSI or Unicode (or EBCDIC- does anyone remember EBCDIC?). Whether we decide to have entries for the letters or not I think mostly we should treat it like we treat IPA: include it where relevant as unlinked text, so people trying to learn it can see how Latin characters are represented in Morse code, but avoid creating entries for words. After all, no one is going to go to a dictionary to look up Morse code that they've heard. Chuck Entz (talk) 04:05, 31 August 2016 (UTC)
  • Can you provide examples of the use of Morse code in print outside of manuals and the like explaining how to use Morse code? bd2412 T 12:29, 30 August 2016 (UTC)
Do these count? [12] [13] --Octahedron80 (talk) 12:49, 30 August 2016 (UTC)
Support including Morse code. Though what would count towards attesting it? I imagine it's possible, as I've encountered Morse code in books before, but I imagine Morse code manuals wouldn't count. Andrew Sheedy (talk) 13:11, 30 August 2016 (UTC)
Support. In fact, I seem to remember that I added some a very long time ago and they got deleted. SemperBlotto (talk) 13:27, 30 August 2016 (UTC)
p.s. Any chance of semaphore? SemperBlotto (talk) 13:27, 30 August 2016 (UTC)
Support, go for it. --. ----- . --.. .-.. (my callsign) DonnanZ (talk) 14:52, 30 August 2016 (UTC)
Question: What characters should we use?
I suggested: - ("HYPHEN-MINUS") for dashes and · ("MIDDLE DOT") for dots.
Wikitiki89 suggested using . ("FULL STOP") for dots.
If all books that we find use the full stop, then I believe we should use the full stop in our entries.
If some books use the middle dot, and others use the full stop, I propose creating entries for both varieties, one of those being as an "alternative form".
-.-- = Morse code for Y.
-·-- = Alternative form of -.--.
More broadly, I think we should create whatever variation of Morse code dashes and dots that is attestable, including "something like bullets () and n-dashes ()" as Wikitiki89 mentioned. Probably the main entries would use hyphen-dash and full stop, because they are the easiest to type and probably can be found in more books. (I didn't check)
--Daniel Carrero (talk) 15:12, 30 August 2016 (UTC)
You have to find a way of displaying the dots and dashes in a straight line. DonnanZ (talk) 15:22, 30 August 2016 (UTC)
Do I? I initially chose using "MIDDLE DOT" and "HYPHEN-MINUS" to display the dots and lines in a straight line, but if published works use a "FULL STOP" like you used above, then it's not really a straight line. --Daniel Carrero (talk) 15:42, 30 August 2016 (UTC)
Note that on my computer in Arial Unicode MS font, the middle dot and hyphen-minus are not lined up (but in the Georgia font used in headings, they do line up). --WikiTiki89 16:04, 30 August 2016 (UTC)
Does this look better: –•–•–•–•–•–•–•? It is a combination of "EN DASH" and "BULLET".
If we want to use hyphen-minus + middle dot (which looks good to me), we could use a new script code, like sc=Morse, to apply Georgia and/or other fonts in Morse code headwords and links. --Daniel Carrero (talk) 16:12, 30 August 2016 (UTC)
Whether they line up is always going to depend on the font. To respond to Daniel Carrero, I think most printed material dedicated to morse code uses its own custom dots and dashes that do not necessarily correspond to any Unicode character. Morse code in more casual usage, such as within a fiction novel, is likely to use full stops and hyphens (and slashes between letters within a word, and spaces between words [or maybe I got that backwards?]). --WikiTiki89 15:32, 30 August 2016 (UTC)
Re: "slashes between letters within a word". Sounds like we can add, in the entry /, the sense: "In Morse code, used between different letters of the same word." Incidentally, the entry . ("FULL STOP") has the sense "In Morse code, the shorter of two marks (the dot)." since April 2016. --Daniel Carrero (talk) 15:39, 30 August 2016 (UTC)
It's better to be professional rather than casual. You need a font which displays in a straight line, also leaving a space between the dashes; ·-, -···, -·-·, --·· are nearly there but not quite good enough. DonnanZ (talk) 17:24, 30 August 2016 (UTC)
Then why are we using straight quotes and apostrophes (" and ') rather than curly ones (, and )? --WikiTiki89 17:39, 30 August 2016 (UTC)
  • Oppose: Morse Code is probably best left as a Wiktionary Appendix rather than having entries in mainspace. If Morse Code is treated as a valid alternative script for the English language (which according to user WikiTiki89 above, it isn't), then every entry on Wiktionary would be eligible to have a Morse Code transliteration included. However, I would check first with the governing body for Morse Code (the International Telecommunication Union) on the validity of its written use. They should have something on their website, I know they have publicly published standards for other forms of communication. Nicole Sharp (talk) 17:33, 30 August 2016 (UTC)
    • From what I understand, Morse Code is ISO 15924 ZXXX, or an unwritten (e.g. in auditory tones or visual flashes) form of communication. Other languages which are communicated (have been communicated) as ISO 15924 ZXXX include protolanguages (such as Proto-Indo-European), the words for which have no historically-documented written forms and are also not allowed in mainspace, but are instead located in appendices. A Wiktionary Appendix of commonly-used words and expressions in Morse Code would definitely be helpful, but I think it would be difficult to include it in the Wiktionary mainspace. Nicole Sharp (talk) 17:46, 30 August 2016 (UTC)
      • Here is the official international standard for Morse Code, as published by its governing body, the International Telecommunication Union: https://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M.1677-1-200910-I!!PDF-E.pdf There is no mention of Morse Code being used as a written or printed form of communication, only as an unwritten signal. They also use periods instead of middots and en dashes instead of hyphens for the code. For consistency with the official documentation, I would suggest following the typography used by the ITU for any Morse Code on Wiktionary. Nicole Sharp (talk) 17:57, 30 August 2016 (UTC)
        Actually, those are not en-dashes, but minus signs (U+2212). And also note that they put spaces between them. --WikiTiki89 18:16, 30 August 2016 (UTC)
        Actually I just noticed that they are inconsistent and sometimes use en-dashes (U+2013) and sometimes minus signs (U+2212). --WikiTiki89 18:19, 30 August 2016 (UTC)
    • I missed the comment above on allowing sign-language entries in Wiktionary. After reading the Wikipedia article as well for Morse Code (including "Morse Code as an Assistive Technology"), I think I am more ambivalent on the issue now and think that Morse Code could work out in mainspace, as long as a way of expressing it can be agreed upon. Having Morse Code transliterations available in Wiktionary could be very useful to not have to look up one letter at a time. I think in the long run it is better to not restrict Wiktionary to just written forms of communication, and that we should be open to as many forms of communication as possible (including communication via colors and symbols, e.g. the color red for "stop"), despite the technical and logistic challenges of presenting and organizing such unwritten forms of communication. Nicole Sharp (talk) 18:45, 30 August 2016 (UTC)
      • Any text can easily be translated (technically, "transliterated" or "encoded" would be more accurate) to Morse code with various automatic online translators (such as this one). In fact we can even easily create our own such tool and put it on an appendix page (Appendix:Morse code). The difference with sign language is that it is an actual language and not simply a way to encode English letters. --WikiTiki89 18:55, 30 August 2016 (UTC)
        • Given Morse Code is (or has been) used for actual communication, it deserves at least an appendix. Full entries don't seem a like a problem to me. Renard Migrant (talk) 19:04, 30 August 2016 (UTC)
        • It's more like Braille really; a different representation for the same letters. We should treat it the same way. —CodeCat 19:04, 30 August 2016 (UTC)
        • The biggest difference though between Wiktionary and the majority of automated online translation/transliteration software is the license. Wiktionary is free and open-source (copylefted) whereas most of those are under commercial for-profit licenses (e.g. Google Translate). The license of the information source doesn't matter to some people, but to other people it does (e.g. Ubuntu Linux users versus Linux Mint users). I think creating a free open-source repository of Morse Code transliterations on Wiktionary isn't a bad idea, whether it is in mainspace or in an Appendix. The argument that the content is already available elsewhere doesn't really hold water (since most everything on Wiktionary is already on for-profit sites like Dictionary.com). If it is decided to be included in mainspace, then the simplicity of Morse works in our advantage, since a bot can probably be programmed to automatically add a Morse Code transliteration to each English-language entry on Wiktionary without human labor needed. I think a simpler start would be to program a Morse Code transliteration of the 2000 words needed for Basic English ("Appendix:Basic English word list," [14]) to be placed in a Wiktionary Appendix, which should be adequate. An Appendix of the 2000 words of Basic English in Morse Code also has the major advantage that it can be printed out or saved as a single document for offline reference. Nicole Sharp (talk) 19:31, 30 August 2016 (UTC)
          • Wiktionary is not copylefted. Copyleft, in my understanding, refers to parasitic licenses like the GPL that infect anything they touch. But that's beside the point. If we provide our own transliteration tool, it will be available under our own CC license. I don't see what problem we would be solving by creating thousands of duplicate entries in Morse code. --WikiTiki89 19:44, 30 August 2016 (UTC)
            • Wiktionary is copylefted. The Creative Commons ShareAlike licence is copyleft. All derivative works of Wiktionary must be licenced under the same terms. —CodeCat 19:50, 30 August 2016 (UTC)
              • Now that I think about it, you're sort of right. But Wiktionary isn't really meant to have "derivative works", but rather to have its content freely available to everyone, so the parasitism of copyleft doesn't really apply. --WikiTiki89 20:05, 30 August 2016 (UTC)
                • There are so many different licenses that it gives me a headache. I personally use the term "copyleft" to generically refer to any work that can be redistributed and modified without the permission of the author. Wikimedia falls under that. From what I understand, anyone can download a copy of Wiktionary and modify it however they like for (nonprofit) republication, as long as they keep the edit histories attached to each wikipage. Nicole Sharp (talk) 20:22, 30 August 2016 (UTC)
                  • You said: "anyone can download a copy of Wiktionary and modify it however they like for (nonprofit) republication".
                  • But Wiktionary is licensed under https://creativecommons.org/licenses/by-sa/3.0/ which explicitly states: "for any purpose, even commercially." If I want to publish and sell a copy of Wiktionary, I can do it, provided I give credit to Wiktionary. I can make changes and derivative works if I want, provided I state what the changes are. --Daniel Carrero (talk) 20:37, 30 August 2016 (UTC)
                • I think that this section though should be split into two subsections: proposals to include or bar Morse Code being within the Wiktionary mainspace versus proposals to add Morse Code as a Wiktionary Appendix outside of mainspace. The latter is less controversial than the former, and will hopefully be quicker to reach a consensus, while debate on the former can continue. Nicole Sharp (talk) 20:22, 30 August 2016 (UTC)
                  • Actually, I don't think anyone would object to an Appendix page, so it doesn't need to be discussed unless someone who opposes it brings it up. --WikiTiki89 20:32, 30 August 2016 (UTC)
                  • Speaking of licenses though, is International Morse Code public domain, or is it patented/copyrighted by the ITU? If it is not public domain, we may not even be able to use it on Wiktionary (versus Wikipedia being able to use it within an article under fair use). Nicole Sharp (talk) 20:42, 30 August 2016 (UTC)
                    • I believe the patent of the Morse code expired, because it's been around for a long time. --Daniel Carrero (talk) 21:15, 30 August 2016 (UTC)

Are we going to have only entries for Braille letters and numbers, or will we allow words such as: -·-· ·- - = cat? At least, if there are attestable Morse code abbreviations we should include those, too. We have a Braille entry ⠁⠇⠍ ("alm") meaning "almost". --Daniel Carrero (talk) 19:29, 30 August 2016 (UTC)

@Daniel Carrero: I would be against creating a trivial example like encoding every conceivable word into Morse code (although providing a redirect would be fine or having a section called Encodings which has Braille, fingerspelling, Morse Code, and semaphore would actually be fine with me). I would be in favor of creating standardized contractions. I created ---.. ---... What does everyone think? —Justin (koavf)TCM 14:16, 1 September 2016 (UTC)
I like the entry ---.. ---... Actually, some users felt that abbreviations like these should be written in Latin script. We already have 88 for "hugs and kisses". I'm fine with keeping both 88 and ---.. ---.., one of those should be the "alternative form". Does anyone would prefer to delete the Morse code entry altogether? (technically, "88" is not "Latin script" but this is not important)
I disagree with a few implementation details, but IMO we can create more entries like these now if we want, and the layout can be edited later. I'll assume that "88" is really English and not Translingual.
I don't like using "Contraction" as the POS for Morse code abbreviations (I also don't like using "Contraction" for Braille entries): "hugs and kisses" is an abbreviation, and the actual POS is a noun. In the entry 88, the POS is already "Noun". I edited 88 to add {{en-plural noun}}.
Also, we need to edit the headword of ---.. ---.. so it will properly display the proper Morse code images that Wikitiki89 created. I would create Morse code templates for English: {{en-morse noun}}, {{en-morse plural noun}}, {{en-morse adjective}}, {{en-morse interjection}}. --Daniel Carrero (talk) 15:34, 1 September 2016 (UTC)
Note that Morse code abbreviations are translingual, not English. In fact they were used in communication between people who did not understand each other's languages (see w:Morse code abbreviations#Informal Language Independent Conversations). --WikiTiki89 16:08, 1 September 2016 (UTC)
Symbol support vote.svg Support Encoding section in entries. Korn [kʰũːɘ̃n] (talk) 15:59, 2 September 2016 (UTC)

polls on the use of Morse Code[edit]

Poll: Allowing Morse code characters[edit]

Proposal: Allowing entries for Morse code letters (A-Z), digits (0-9), punctuation marks and possibly other symbols and characters, such as commercial at (@ = ·--·-·), addition sign (+ = ·-·-·), etc.

This is a poll with no policy value.

  1. Symbol support vote.svg Support. This sounds something natural to do, like we have for Braille A, for Braille B, etc. --Daniel Carrero (talk) 21:32, 30 August 2016 (UTC)
  2. Weak Symbol support vote.svg Support, per Daniel. Putting all the letters in an appendix is also a possibility, although if any string which deserves an entry for another reason happens to also be a Morse code letter, I would certainly mention the Morse code letter somewhere in the entry. - -sche (discuss) 21:59, 30 August 2016 (UTC)
  3. Symbol support vote.svg SupportCodeCat 22:03, 30 August 2016 (UTC)
  4. Symbol support vote.svg Support Also some Japanese need 2 codes for 1 character. For example パ, we need to make the code of ハ and ゜ consecutively. --Octahedron80 (talk) 00:21, 31 August 2016 (UTC)
  5. Symbol support vote.svg Support for the basic character set. bd2412 T 15:43, 31 August 2016 (UTC)
  6. Symbol support vote.svg Support. Andrew Sheedy (talk) 17:07, 31 August 2016 (UTC)
  7. Symbol support vote.svg Support. Leasnam (talk) 17:25, 31 August 2016 (UTC)
  8. Symbol support vote.svg SupportEru·tuon 03:33, 1 September 2016 (UTC)
  9. Symbol support vote.svg SupportAɴɢʀ (talk) 14:14, 2 September 2016 (UTC)
  1. Symbol oppose vote.svg Oppose I think we should take the same approach to Morse Code as we do with unwritten protolanguages. We include the Morse Code transliteration in a subsection under the main entry, but with the individual Morse characters wikilinked to an Appendix, not as mainspace entries. See third poll response below. Nicole Sharp (talk) 07:04, 31 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose I see no reason that an Appendix page would be insufficient. --WikiTiki89 12:37, 31 August 2016 (UTC)
  3. Symbol oppose vote.svg Oppose Cruft, useless except to occupy contributors who might do even worse. DCDuring TALK 14:25, 1 September 2016 (UTC)
    See w:Wikipedia:Assume good faith. (I know it does not qualify as a Wiktionary policy, but it looks better than our Wiktionary:Assume good faith, I think.) --Daniel Carrero (talk) 14:37, 1 September 2016 (UTC)

Poll: Allowing Morse code abbreviations[edit]

Proposal: Allowing entries for Morse code-specific abbreviations, written in Morse code. Example: ···· ·-- ("hw") = how. See w:Morse code abbreviations for a list.

This is a poll with no policy value.

  1. Symbol support vote.svg Support. Absolutely. We also have entries for Braille-specific abbreviations, such as ⠁⠇⠍ ("alm") meaning "almost". --Daniel Carrero (talk) 21:32, 30 August 2016 (UTC)
  2. Symbol support vote.svg Support But I think they should be presented in Latin script, not Morse notation. Same for Braille abbreviations. —CodeCat 22:04, 30 August 2016 (UTC)
  3. Symbol support vote.svg Support "All words in all languages." Philmonte101 (talk) 22:26, 30 August 2016 (UTC)
  4. Symbol support vote.svg Support --Octahedron80 (talk) 00:21, 31 August 2016 (UTC)
  5. Symbol support vote.svg Support, but only to the extent that these can be cited/attested. The obvious example would be ··· --- ···. bd2412 T 15:45, 31 August 2016 (UTC)
  6. Symbol support vote.svg Support, provided they meet CFI. Andrew Sheedy (talk) 17:08, 31 August 2016 (UTC)
  7. Symbol support vote.svg Support Leasnam (talk) 17:25, 31 August 2016 (UTC)
  8. Symbol support vote.svg SupportAɴɢʀ (talk) 14:12, 2 September 2016 (UTC)
  1. Symbol oppose vote.svg Oppose Same as above. See below. Abbreviations should be listed in mainspace in Roman script (e.g. HW), with the Morse Code transliteration for the abbreviation listed under the "Morse Code" subsection. Nicole Sharp (talk) 07:04, 31 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose Like Nicole Sharp, I think abbreviations should be entered in roman script (at HW, for example). --WikiTiki89 12:37, 31 August 2016 (UTC)
  3. Symbol oppose vote.svg Oppose Absolutely fucking ridiculous. Can you find any more productive way to waste your time? DTLHS (talk) 04:12, 2 September 2016 (UTC)
    @DTLHS: After you created an RFD for 2 Morse code entries, I pointed you to this poll. Were you insulting me specifically, or the person who gave the idea, the people who supported the proposal or the people who otherwise helped to edit the templates/entries? Anyway, I'd appreciate if we could abide by the (non-Wiktionary) policies w:WP:CIVIL and w:WP:AGF. --Daniel Carrero (talk) 04:20, 2 September 2016 (UTC)
    It applies equally to whoever came up with the idea and the people who supported it. DTLHS (talk) 04:26, 2 September 2016 (UTC)

Poll: Allowing Morse code terms[edit]

Proposal: Allowing entries for normal English terms written in Morse code, such as Morse for "cat" and "dog".

This is a poll with no policy value.

  1. Symbol support vote.svg Support: "all words in all languages". Regardless of how they're written. If it's attested, I say go for it. Philmonte101 (talk) 22:23, 30 August 2016 (UTC)
  2. I do not think there is any need to place Morse Code as mainspace entries. However, I think it would be very helpful if a Morse Code transliteration could be added to each English-language entry, perhaps next to the IPA spelling or under its own heading. If we have room in headings for trivia such as having a heading for the anagrams of each word then we can certainly make room for adding a handful of dots and dashes in each entry. Morse Code is simple enough that a bot can be programmed to automatically add a new section to each English-language entry with the Morse Code transliteration of the word. If someone wanted to reverse-search Morse Code, they can still type it into search and it would come up as under the entry content, without needing duplicate mainspace entries in Morse. Nicole Sharp (talk) 06:33, 31 August 2016 (UTC)
  1. Symbol oppose vote.svg Oppose, just as we don't have (full) Braille words, or AFAIK full (multi-character) Deseret words, etc. - -sche (discuss) 22:01, 30 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose. Morse is pretty much never written, it was never intended to be written. It maps one-to-one to the Latin alphabet, people don't think of Morse code as a separate alphabet, just an encoding of Latin letters. This makes it very different from sign languages, which are very clearly not English or Latin script (although they do have a mechanism to "spell" words, but then, we have methods to spell out signs in writing too). —CodeCat 22:15, 30 August 2016 (UTC)
  3. Symbol oppose vote.svg Oppose with extreme prejudice. Never in a million years. DTLHS (talk) 23:59, 30 August 2016 (UTC)
  4. Symbol oppose vote.svg Oppose per others. --Daniel Carrero (talk) 00:02, 31 August 2016 (UTC)
  5. Symbol oppose vote.svg Oppose Nope for vocabulary. Just characters and abbreviations. --Octahedron80 (talk) 00:22, 31 August 2016 (UTC)
  6. Symbol oppose vote.svg Oppose per DTLHS. DCDuring TALK 02:34, 31 August 2016 (UTC)
  7. Symbol oppose vote.svg Oppose. No one is going to go to a dictionary to look up a string of dahs and dits that they've heard somewhere. The only way they're going to find entries is from other entries, and there's nothing of any value in an entry that wouldn't already be in the text of the link. Chuck Entz (talk) 04:05, 31 August 2016 (UTC)
  8. Symbol oppose vote.svg Oppose strongly. --WikiTiki89 12:38, 31 August 2016 (UTC)
  9. Symbol oppose vote.svg Oppose - Morse code is not a language, it is just a means of transmitting existing languages. bd2412 T 15:41, 31 August 2016 (UTC)
  10. Symbol oppose vote.svg Oppose. Andrew Sheedy (talk) 17:09, 31 August 2016 (UTC)
  11. Symbol oppose vote.svg Oppose Leasnam (talk) 17:26, 31 August 2016 (UTC)
  12. Symbol oppose vote.svg OpposeAɴɢʀ (talk) 14:13, 2 September 2016 (UTC)
Symbol abstain vote.svg Abstain until further discussion. Should we duplicate all our entries in Morse code, like we do for romanizations in some languages? I'm curious if we would allow all English terms in Morse code, (like we do for romanization entries) or just English terms that are attestable in Morse code text. --Daniel Carrero (talk) 21:32, 30 August 2016 (UTC)
Have you ever, ever, ever, EVER thought about doing stuff based on what human beings want and need?
Why that attitude? You even forgot your signature, Equinox, are you drunk again? --Daniel Carrero (talk) 00:06, 31 August 2016 (UTC)
I deliberately dropped the sig. Treat the question honestly, as it deserves. Nobody needs Morse code. It's just Unicode wank from someone who doesn't spend any time on creating useful content. If we could create an entry for every little pixel on the screen, you'd create 640x480 entries before we could do anything about it. Equinox 00:10, 31 August 2016 (UTC)
"someone who doesn't spend any time on creating useful content" is too harsh, and unfair. What about my new Portuguese and English entries, my time rewriting definitions, adding quotations, creating templates and modules and proposing new policies and edits to WT:EL, and helping people with questions and favors asked in my talk page?
I agree with @Nicole Sharp above: "I think in the long run it is better to not restrict Wiktionary to just written forms of communication, and that we should be open to as many forms of communication as possible (including communication via colors and symbols, e.g. the color red for "stop"), despite the technical and logistic challenges of presenting and organizing such unwritten forms of communication." You call it "Unicode wank". So what? Sometimes you vote oppose when I create a new proposal, but if enough people vote support, they pass, and it's not up to me to decide, or you. --Daniel Carrero (talk) 00:28, 31 August 2016 (UTC)
Indeed, I should start a vote, and a huge number of votes, so that the people who care are so overwhelmed they can't actually tell what is going on, to use Wiktionary to create entries for my personal family tree. An entry for my uncle. An entry for my aunt. Dude, I am sure you have done some good work but about 70% of your ideas are literally garbage. kisses. Equinox 00:31, 31 August 2016 (UTC)
Do you think I have too many active votes right now? Of the 14 active votes, I have 6 votes in the list. (one ended yesterday so make it 5 out of 13) I apologize because I flooded the list of votes at some point around February, and since then I've been trying not to do it again. Anyway, Morse code was not my idea but I support it for single characters and Morse-specific abbreviations like we do in Braille. (I wonder what is your position about Braille chars and abbreviations?) Apparently you are free to think my ideas are garbage and I'm not angry, but it was rude of you to say things like "He just wants to do it to fulfill some kind of nerdy, autistic bullshit." (said by you below) I'm not perfect but I like to create new things and I hope some are useful, even if sometimes I'm not up to your expectations specifically.
All this is because we've been discussing my idea about a "Description" section recently and I maintained my position despite your multiple objections? Please don't think that I'm just stubborn and refusing to admit that the "Description" section is an obvious garbage. I really tried to reply to all your questions in that conversation. (And you didn't answer some of my questions, but that's forgivable, we've been talking a lot.) --Daniel Carrero (talk) 00:59, 31 August 2016 (UTC)
Just remember that every new feature you want to add to Wiktionary will require time and effort from the community to assess it. Even worthwhile proposals have a certain cost, and the more you have going at the same time, the more these costs compound each other. I'd like to be able to just create and edit entries without having to be constantly dropping everything to express my opinion on, say, pig Latin entries (DON'T EVEN THINK ABOUT IT!!! ;-)). We can't just tune you out, because then you'll just go ahead and run with everything, including that non-negligible percentage of lame and otherwise awful ideas. Yes, it's nice to have a little water now and then, but when you open the floodgates, we have to stop and pump out the basement. Chuck Entz (talk) 04:05, 31 August 2016 (UTC)
@Chuck Entz: I'll have this in mind, thanks: "every new feature [...] will require time and effort from the community ...".
You made that comment, specifically, in the poll "Allowing entries for normal English terms written in Morse code.", so at least let me point out that I don't want terms in Morse code (I don't want Pig Latin terms either!), though I do support having Morse characters like "A" and other stuff. I just thought this poll would be a good idea in contrast to the other Morse polls, so it's clear that a lot of people oppose it.
Re: "you'll just go ahead and run with everything, including that non-negligible percentage of lame and otherwise awful ideas". I challenge you to make a list of my lame or otherwise awful ideas. Thank you. --Daniel Carrero (talk) 20:58, 2 September 2016 (UTC)

Oppose - Morse code[edit]

  • Oppose. This isn't in the right section but I can't even tell what's going on, and when I edit the page, it's confusing, and hard. Move it if you must. I oppose because I don't believe that the creator wants to create Morse code to help anyone. He just wants to do it to fulfil some kind of nerdy, autistic bullshit. It won't help anyone. It's worthless. Forbid it. If anyone ever comes here and says "we can't use Wiktionary because we need Morse code in order to read it", we should start using Morse code. This naturally won't happen because Morse code is from the telegraphy era and totally obsolete now. Grow up and start doing useful, meaningful entries. Equinox 23:35, 30 August 2016 (UTC)
    • I created a separate "Oppose - Morse code" section for your vote, if that helps. --Daniel Carrero (talk) 00:02, 31 August 2016 (UTC)
    • "He just wants to do it to fulfil some kind of nerdy, autistic bullshit." Isn't this pretty much all of us, Eq? Why else would we lavish so much time on Wiktionary? Don't be a hater for hate's sake. —Μετάknowledgediscuss/deeds 05:36, 31 August 2016 (UTC)
    • As an autistic woman myself, I find that kind of language to be derogatory, offensive, and hateful. Bashing a contributor's neurology or (dis)ability status has no place on Wikimedia. And Morse Code is not obsolete, go read "wikipedia:Morse Code." It is required knowledge for radio licenses in the USA, and also by USA Air Force personnel. Nicole Sharp (talk) 06:15, 31 August 2016 (UTC)
      • (Not to mention that we document obsolete writing systems too!) --Daniel Carrero (talk) 06:44, 31 August 2016 (UTC)
        • Speaking of obsolete writing systems, we should start a campaign to also add (asterisked) Runic transliterations to the Old English, Old Norse, and Proto-Germanic entries.  :-/ Nicole Sharp (talk) 06:50, 31 August 2016 (UTC)
  • Symbol oppose vote.svg Oppose I basically agree with Equinox. Morse code is still used as part of some rescue and disaster tropes in movies. DCDuring TALK 02:38, 31 August 2016 (UTC)

Abstain - Morse code[edit]

polls on the typography for Morse Code[edit]

Poll: Morse code format: "FULL STOP", "HYPHEN-DASH"[edit]

Proposal: Naming Morse code using "FULL STOP", "HYPHEN-DASH", like this: .-.-.-.-.

This is a poll with no policy value.

  1. Symbol support vote.svg Support Do both and keep them as alternative forms of each other. Philmonte101 (talk) 22:25, 30 August 2016 (UTC)
  2. Symbol support vote.svg Support. If we create entries for Morse code characters, this should be the format of the entry titles. We can format the headword lines with images so that they line up appropriately. --WikiTiki89 12:41, 31 August 2016 (UTC)
  3. Symbol support vote.svg Support as the actual entry form, but display the header with en dashes if that option is not used as the main form. Andrew Sheedy (talk) 20:05, 31 August 2016 (UTC)
  1. Symbol oppose vote.svg Oppose --Daniel Carrero (talk) 21:32, 30 August 2016 (UTC)
    Pro: .-.-.-.- is easy to type. Con: single dot (letter I) and two dots (letter E) are impossible to use as normal entry titles and would have to be kept in Unsupported titles/Full stop and Unsupported titles/Double period, respectively.
    I'm opposing this because I prefer using "MIDDLE DOT", "HYPHEN-DASH", like this: ·-·-·-·-. Actually, I support using the full stop entries as "alternative form" entries, assuming they are attestable.--Daniel Carrero (talk) 21:32, 30 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose For the reasons Daniel gives. —CodeCat 22:16, 30 August 2016 (UTC)
  3. Symbol oppose vote.svg Oppose Technical limitation. But okay making them redirects. --Octahedron80 (talk) 00:24, 31 August 2016 (UTC)
  4. Symbol oppose vote.svg Oppose The official standard for Morse Code uses minuses, not hyphens. Nicole Sharp (talk) 06:53, 31 August 2016 (UTC)
Symbol oppose vote.svg Oppose, but create as redirects if possible. Andrew Sheedy (talk) 17:14, 31 August 2016 (UTC)

Poll: Morse code format: "MIDDLE DOT", "HYPHEN-DASH"[edit]

Proposal: Naming Morse code using "MIDDLE DOT", "HYPHEN-DASH", like this: ·-·-·-·-.

This is a poll with no policy value.

  • Symbol support vote.svg Support. There are many types of dots and dashes on Unicode, but those feel the most "generic" to me. We can introduce a new script code, such as |sc=Morse, to use the right fonts (Georgia would do, apparently) that make the dot and dash display properly aligned in a horizontal line. --Daniel Carrero (talk) 21:32, 30 August 2016 (UTC)
    I changed my vote to middle dot full stop + en dash. --Daniel Carrero (talk) 16:08, 31 August 2016 (UTC)
  1. Symbol support vote.svg Support, though I'm opposed to having Morse word entries in general. —CodeCat 22:17, 30 August 2016 (UTC)
  2. Symbol support vote.svg Support Do both of them as alternative forms of each other. Philmonte101 (talk) 22:26, 30 August 2016 (UTC)
  3. Symbol support vote.svg Support --Octahedron80 (talk) 00:25, 31 August 2016 (UTC)
  1. Symbol oppose vote.svg Oppose The official standard for Morse Code uses periods and en dashes minuses, not hyphens and middots. Nicole Sharp (talk) 06:54, 31 August 2016 (UTC)
  2. Symbol oppose vote.svg Oppose Let's use en dash per Nicole Sharp. Still agree with middle dot. --Octahedron80 (talk) 06:55, 31 August 2016 (UTC)
    • Actually, they are apparently minuses, not en dashes (both look identical on my screen). See analysis by user Daniel Carrero below. Nicole Sharp (talk) 07:16, 31 August 2016 (UTC)
  3. Symbol oppose vote.svg Oppose. If we want alignment, we can format the headword lines with images so that they line up appropriately. --WikiTiki89 12:42, 31 August 2016 (UTC)

Poll: Morse code format: "MIDDLE DOT", "EN DASH"[edit]

Proposal: Naming Morse code using "MIDDLE DOT", "EN DASH", like this: ·–·–·–·–.

(This option was not present in the poll before, it was proposed and added later.)

This is a poll with no policy value.

Symbol support vote.svg Support --Daniel Carrero (talk) 16:08, 31 August 2016 (UTC)
As discussed below, the official document http://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M.1677-1-200910-I!!PDF-E.pdf mostly uses minuses and middle dots, but the minuses seem to be a mistake on their part and might not make sense. The en dash seems to be typographically identical to the minus and does not have the same mathematical meaning as the minus has. Hence let's use the en dash. --Daniel Carrero (talk) 16:08, 31 August 2016 (UTC)
You are mistaken, it uses full stops and minus (and in some places full stops and en dashes), but not middle dots. --WikiTiki89 17:13, 31 August 2016 (UTC)
Thank you for pointing my mistake. --Daniel Carrero (talk) 17:18, 31 August 2016 (UTC)
  1. Symbol oppose vote.svg Oppose. --WikiTiki89 17:13, 31 August 2016 (UTC)

Poll: Morse code format: "FULL STOP", "EN DASH"[edit]

Proposal: Naming Morse code using "FULL STOP", "EN DASH", like this: .–.–.–.–.

(This option was not present in the poll before, it was proposed and added later.)

This is a poll with no policy value.

  1. Symbol support vote.svg Support --Daniel Carrero (talk) 16:08, 31 August 2016 (UTC)
    As discussed below, the official document http://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M.1677-1-200910-I!!PDF-E.pdf mostly uses minuses and full stops, but the minuses seem to be a mistake on their part and might not make sense. The en dash seems to be typographically identical to the minus and does not have the same mathematical meaning as the minus has. Hence let's use the en dash. --Daniel Carrero (talk) 17:19, 31 August 2016 (UTC)
    Also note that the document puts spaces between each character and enlarges the font, like this: . – . – . – . – --WikiTiki89 17:21, 31 August 2016 (UTC)
    If we add |sc=Morse in all Morse code entries, we can make the CSS "font-size" larger. I prefer not adding spaces manually between all Morse characters, because I'd rather use CSS "letter-spacing" to make the kerning a bit wider. --Daniel Carrero (talk) 17:27, 31 August 2016 (UTC)
    Or we can enter them as .--. and display them in the headword line with an image. That would put the entry at the most commonly typed form and would resolve the problem of displaying them properly. --WikiTiki89 17:29, 31 August 2016 (UTC)
    Here is example of what an entry could look like. The images I found are not 100% ideal, but we can deal with that if decide to go with this option. Alternatively, we can just include the audio at P and be done with it. --WikiTiki89 18:22, 31 August 2016 (UTC)
  1. Symbol oppose vote.svg Oppose. I don't think that the document in question is implying that this is any sort of standard for the computerized encoding of written morse code. --WikiTiki89 17:22, 31 August 2016 (UTC)

Poll: Morse code format using other characters[edit]

Proposal: Naming Morse code using using other characters for dot and dash. (Presumably, oppose and abstain votes can be given once there are any new proposals for dots and dashes.)

This is a poll with no policy value.

I am off the topic. With パ as above example needs 2 codes, should we use a space, or a slash, or space-slash-space to tokenize it? IMO I would use just a space. --Octahedron80 (talk) 00:36, 31 August 2016 (UTC)
  1. Symbol support vote.svg Support, but again, keep as alternative forms. Perhaps we may want to have "MorseBot", who can create all the alt forms automatically whenever a morse entry is created. Philmonte101 (talk) 22:27, 30 August 2016 (UTC)
  2. My suggestion would be to use the same typography used in the official standard for International Morse Code, the governing body for which is the United Nations International Telecommunication Union. The official standard from the United Nations for international use is given here: http://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M.1677-1-200910-I!!PDF-E.pdf . They use the format of periods and en dashes minuses (not hyphens) in bold font, separated by spaces. Nicole Sharp (talk) 06:22, 31 August 2016 (UTC)
    • Actually, that does not seem to be the case. The PDF file you linked has a certain degree of inconsistency: it contains 114 occurences of the ("MINUS SIGN") and 22 occurences of the ("EN DASH"). For example, in the list "Punctuation marks and miscellaneous signs", almost all the Morse codes are using the minus sign, except the commercial at, which is using the en dash. --Daniel Carrero (talk) 07:06, 31 August 2016 (UTC)
      • Thank you for the analysis; en dashes and minuses display identically on my computer. I corrected above. Either way, they are not hyphens. Nicole Sharp (talk) 07:11, 31 August 2016 (UTC)
        • Minuses might not make sense. Let's use en dashes instead. --Octahedron80 (talk) 07:15, 31 August 2016 (UTC)
          • I do agree with that. Minuses should be reserved for mathematical use, per Unicode guidelines. The en dash would be the correct character for this use. Someone should email the ITU with their mistake. Nicole Sharp (talk) 07:19, 31 August 2016 (UTC)
            • I agree with using en dashes. Minuses seem to unavoidably imply the mathematical subtraction sense. (and "varieties" of the subtraction sense, such as the "negative number" sense and the "cathode" sense, as seen in: ) The en dashes seem to be typographically better because they are "generic" dashes, without an inherent meaning. --Daniel Carrero (talk) 07:23, 31 August 2016 (UTC)
  3. Symbol support vote.svg Support using en dashes and periods/full stops, with redirects for hyphens and periods and minus signs and periods. Andrew Sheedy (talk) 17:47, 31 August 2016 (UTC)

Comment - Morse code[edit]

As Wikitiki89 says above, the standards document shouldn't be interpreted as setting a standard for representation in writing. My guess is that they used whatever was convenient, and whatever word processing app they wrote it in converted the dashes and dots into whatever Unicode characters its algorithms chose (remember Smart Quotes™?).

Whatever we choose should only be used for the entry name itself. For everything else, we should have a module that converts regular characters into their Morse code representation, and use it for everything. That would make the wikitext a lot easier to work with. The only reason to make an exception would be if there were more than one possible way of representing the same character in Morse code (not likely for the Unicode Basic Latin block, but who knows about all the other characters?). It also would mean that we could switch from hyphens to en-dashes to em-dashes to images to whatever we wanted by fiddling with a line or two of code and stop debating over standards that only affect appearance. Chuck Entz (talk) 02:49, 1 September 2016 (UTC)

  • I created the Morse code entries for letters and numbers. See .-. It links to the other entries. Feel free to give suggestions, edit the entries, etc. --Daniel Carrero (talk) 08:17, 1 September 2016 (UTC)
    I much prefer this version. DCDuring TALK 14:31, 1 September 2016 (UTC)


FYI: I read all the votes between 2005–2016 and attempted to clean up / rewrite Wiktionary:Votes/Timeline completely. I added the missing votes.

--Daniel Carrero (talk) 06:22, 31 August 2016 (UTC)

Thank you! - -sche (discuss) 18:48, 1 September 2016 (UTC)
Very nice! — Eru·tuon 22:00, 1 September 2016 (UTC)
 :) --Daniel Carrero (talk) 07:56, 3 September 2016 (UTC)


Lately I've been referencing Proto-Turkic in Mongolian etymology sections, and I think we need to decide on a standardisation now before the usages grow too numerous and inconsistent. Here are the details we need to decide on:

  1. z or ŕ?
  2. ĺ or š or ş?
  3. y or ı?
  4. j or c or dž?
  5. c or č?
  6. y or j?
  7. ä and e or e and é or some other combination?
  8. Should length be indicated with a macron or with a colon?
  9. Should we write d- or ń- in cases of Common Turkic y- words that have been loaned with n- and d- into Mongolian and Hungarian?
    Or only an etymology section "From earlier..."?
  10. Should we write non initial syllable o/ö in cases where Brahmi script implies it and Mongolian loans have a/e?
  11. -d2- or -ð- or -d- for Chuvash -r- Common Turkic -y- Khalaj -d-
  12. Should we write d in places where Oghuz has it and other languages have t?
  13. Should we mark back and front vocalic k and g differently?
    ǵ and g, or g and ğ; ḱ and k, or k and q

@Anylai, @Madina, @Vahagn Petrosyan, not sure who else uses Proto-turkic here, please ping them. —This unsigned comment was added by Crom daba (talkcontribs).

Let me repeat the ping, since it wouldn't have worked without a signature: @Anylai, @Madina, @Vahagn Petrosyan Chuck Entz (talk) 03:46, 1 September 2016 (UTC)
Rhotacism and zetacism along with lambdacism are still debated topics in Turkic. Clauson looks at the oldest records of Turkic and thinks it is rhotacism and lambdacism that happened in other languages, but amount of external evidence in Hungarian, Mongolic languages and some Siberian languages suggests otherwise. Therefore, usually ŕ and ĺ are reconstructed for Proto-Turkic, for Common Turkic you can use z or š. Turkic loanwords in Mongolic are divided into 3 periods, for first period of loanwords you will meet l and r in Mongolic, therefore use ĺ and ŕ, infact ĺ-->l, ŕ-->r is actually observed in modern Turkic languages too. I usually reference from Starling, and for your questions they have:
č, j instead of y, ɨ instead of y or ı. Not too many reconstruct dž, at least not word initially. ń in not reconstructed for Proto Turkic in word initial position, d is also debated because only Oghuz languages have it and it is not very consistent. Generally Altaicists reconstruct a secondary d instead of t where a word ends with ĺ, n, ŕ which should again be inherited from Proto Altaic t, because Proto Altaic word initial d is not equal to Proto Turkic d for them. But we are talking about another debated topic which is Altaic, some are assumptions that rely on Altaic hypothesis so most of the time you will in fact hear they are not loanwords.
Use macron for long vowels. As far as I know, at Proto Turkic level it is assumed that k and g were not yet split into q and gh. Starling prefers /ki/ instead of ḱ where Chuvash suggests it so, for example see *Kiār (snow). By the way it is also accepted that Proto Turkic had word final b instead of w. --Anylai (talk) 18:25, 1 September 2016 (UTC)
I don't think we should rely too much on Dybo-Starostin and Starling in general, Altaic hypothesis is controversial and reconstructions given there may be too dependent on a given Altaic etymology. As far as I'm concerned I'd prefer a convention near to Clauson (I wouldn't object to easily made changes such as ŕ instead of z, and changes that look prettier or make it easier for me to input the word such as ä/e instead of e/é and macrons instead of colons) Crom daba (talk) 00:56, 2 September 2016 (UTC)
Their Proto-Turkic reconstructions is not bad and do not rely on Altaic, only semantics could be distorted, in fact it is better than Clauson who only has a dictionary of pre 13th century Turkic. Clauson makes a lot of assumptions too, if you read his books and entries in his dictionary, he sometimes mentions "a 1st period loanword in Mongolian." etc... Here you meet l and r corresponding to Common Turkic š and z. He makes up an imaginary language, the language of the Tabgach (Touba) who Mongols apparently borrowed these words from before the earliest Turkic records (8th century) and already have z and š. For him somehow the Tabgach also spoke a language close to Chuvash, undergoing same assimilations in the same areas, and the original consonants were z and š, this is not convincing just like pretty much like most entries in Starling's Proto Altaic database. You can use that source, there are references and comments below the etymology which you can also make use of, those who reject, who think Mongolic one is a borrowing, etc... We can not deny there are a lot of cognates between Turkic and Mongolic, yes there are very obvious loans but also very dangerous words that even the most primitive community wouldnt need to borrow from a different language. Yes according to Clauson the relationship between Mongolic and Turkic is just an expected list of loanwords in a primitive community which in this state is the Mongolic one. --Anylai (talk) 06:48, 4 September 2016 (UTC)
Whether we select zetacism or rhotacism as primary is non-essential, there is no risk of misunderstanding either way for the person reading the entry. Personally I'm undecided on the question.
Bigger problems are their *ạ and *ia which aren't accepted in mainstream Turkic studies, ignoring Khalaj h- as secondary when it doesn't line up with Altaic comparanda, and the annoying convention of writing capitals to indicate an uncertain reconstruction; and I hate to see all that adopted here wholesale.
I disagree with Clauson's outright denial of Altaic, but I feel that a bigger fallacy is being committed by Starostin et al when denying prehistoric (pre-13th century even) word loaning among Mongolic and Turkic in the introduction to their Altaic dictionary. Virtually all language contact situations I know of have been highly asymmetrical, and I think there's no lexical sphere that is completely immune to borrowing.
Starling is indeed a handy resource, but you should probably double check with the references if they're available, I've caught a few mistakes in the Mongolian database and I've heard the same about the Uralic one (as can be expected of a project with such a big scope) Crom daba (talk) 02:21, 5 September 2016 (UTC)
Not a Turkologist, but I'd like to note that ɨ does not combine well with macrons (you get ɨ̄ instead of the expected dotless version); an i/ı distinction would have similar problems. (Also, just in case, reminder to please put whatever you decide on at Wiktionary:About Proto-Turkic.) --Tropylium (talk) 07:38, 7 September 2016 (UTC)
I hope you don't let font issues get in the way of making a correct decision. --WikiTiki89 15:59, 7 September 2016 (UTC)
ɨ̄ shows as two characters for me in the editing box, but on the screen once posted, it shows correctly, with the macron replacing the dot.--Prosfilaes (talk) 22:12, 8 September 2016 (UTC)

September 2016

{{also}} template[edit]

Hello -- I noticed that of the c. 495 thousand entries which differ from other entries only in diacritic marks or capitalisation, only c. 172 thousand have {{also}} templates. Would it be worthwhile for me to add these to the remainder? Also, some dozens of thousands which do have these templates are missing a subset of the items in their respective congruence classes. Would it also be worthwhile to complete the arguments for these templates? An example is gort and ğort. Apologies for coming here with such a fiddly question. Isomorphyc (talk) 01:56, 1 September 2016 (UTC)

Yes, if you're confident you can de-diacritize and classify them correctly. DTLHS (talk) 01:59, 1 September 2016 (UTC)
If it can be done conveniently (and correctly), yes.
Some users, especially but not only, in English-speaking countries are not facile with diacritics, eg, me. More importantly, I don't think anonymous users have access to the means we offer to overcome keyboard limitations.
IMO the most important part of the task is to make sure that on all entries that use only the no-diacritic Roman character set {{also}} includes all the entries that use diacritics that correspond to the plain entries. English, Latin, and "Translingual" are the only languages that matter to me. A smaller subset would be only lemma entries.
I'm sure there are other points of view. DCDuring TALK 02:10, 1 September 2016 (UTC)
Yes, please! I have asked before for someone to do this. Note that there may be a limit to how many arguments {{also}} can handle, and there is in any case a limit to how many we would want to display (let's discuss what that would be: more than 15 links?). For terms that would otherwise display more than that number of alsos, it is preferable to set up and link to an appendix, the way a links to Appendix:Variations of "a" rather than listing all 100+ variants directly in a. - -sche (discuss) 02:20, 1 September 2016 (UTC)
@-sche: For what it is worth, there are 77 congruence classes having more than 15 members and 129 classes having more than ten members. The largest groups are bo (19), y (19), s (20), sa (20), n (21), i (24), u (38), e (41), o (57) and a (61). My lists currently do not include differences in punctuation; the classes will be slightly larger and more numerous when this is included. The idea of creating an appendix for classes larger than ten or fifteen sounds reasonable to me, but if I create such appendices, they will provide less information than those which already exist. I would be uncomfortable also including, for example, the same sequence of letters in other scripts or Hanzi represented by the same letters in some transliteration scheme, as is currently the practice. I do believe this can be done without errors on a very large and somewhat easy to define subset of the relevant entries, mostly deferring work on scripts with which I may be uncomfortable. Isomorphyc (talk) 02:47, 1 September 2016 (UTC)
Since we're talking about the difference between no appendix and one that's not as complete as it possibly could be, I don't see the problem: this is a wiki, and others can expand on those later. Chuck Entz (talk) 03:32, 1 September 2016 (UTC)
Yes - I think that a bot could do this even better. SemperBlotto (talk) 05:39, 1 September 2016 (UTC)
It's well suited to a bot, but a bot would not be able to create the appendix pages when there are larger numbers. To do this, the appendix pages would need to follow a standard format and not have any additional information added. —CodeCat 12:25, 1 September 2016 (UTC)
But a bot could generate a list of appendix pages that need to be created and pages that need to be added to them. --WikiTiki89 12:54, 1 September 2016 (UTC)
  • I'm all in favor of this. If there's any risk of {{also}} taking more than about 8 or 9 arguments, then an "Appendix:Variations of..." should be created anyway. —Aɴɢʀ (talk) 11:26, 1 September 2016 (UTC)

Second LexiSession : paths, roads and ways[edit]

Dear all,

Apologies for writing in non-native English; please fix any mistakes you may encounter in these lines!

The Tremendous Wiktionary User Group, a nice and open gathering of Wiktionarians, is happy to introduce the second chapter of our collective experiment: LexiSession.

So, what is a LexiSession? The idea is to coordinate contributors from different languages to focus on a shared topic, to enhance all projects at the same time! It may remind you of the Commons monthly contests, but here everyone is a winner! First LexiSession was about cat and it was a beginning. For this second LexiSession, we offer a month - until the end of September - to pave the way! There is plenty of names for different kind of roads, streets, avenues, and ways, and wiktionaries can be very helpful to help people to pick the correct one to describe or to translate.

English Wiktionary already have a Wikisaurus:road and a Wikisaurus:way but there is still a lot of information to provide. Well, why is it almost in alphabetical order? How to distinguish between roadway and motorway (for instance)? Is it possible to help readers with pictures or something? These are not instructions, and everyone is welcome to imagine new solutions to provide information about semantic networks and variation. Also, you may be interested to know that French Wiktionary already has eight different thesaurus about streets in eight different languages, including English.

Please share your contributions here! You can also have a look at what other Wiktionarians are doing, on the LexiSession Meta page. We will discuss the processes and results in Meta, so feel free to have a look and suggest topics for the following LexiSession.

Thank you for your attention, and I hope you will be interested in this new way of contributing. I'll get back to you later this month for an update! Noé (talk) 10:53, 1 September 2016 (UTC)

Great topic (wasn't too keen on the cats). For this session I'm especially interested in local names which are only used in a specific city or region. Also interesting would be to describe (also visually?) hierarchies of paths. – Jberkel (talk) 12:23, 1 September 2016 (UTC)
@Noé I'll try to contribute more to this one, provided school doesn't get in the way! Let's hope participation is better than last time. :) (And as a tiny note about your English, don't forget that the third person singular of have is has....) Andrew Sheedy (talk) 01:38, 2 September 2016 (UTC)
@Noé another correction, also there are plenty of names.

borrowing → bor[edit]

Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor passed. Results: 14-5-3 (73.68%-26.32%) (not counted: +1 late oppose vote). Can someone please do the honors and edit the template in all entries?

FYI: See Thread:User talk:CodeCat/borrowing → bor. In the discussion, I asked CodeCat first, but she said: "I don't think it's right to do it given the strong opposition." Do we need to discuss this further before doing the change? I was hoping we could go on with it and have {{bor}} used in a way more consistent with {{inh}} and {{der}}.

As I mentioned in the conversation with CodeCat, I believe I found some important numbers concerning how the templates are used. Correct me if I make any mistake in the numbers or their interpretation. {{inh}} and {{inherited}} were created together, and it appears that almost all entries that display an etymological inheritance use the shorter form. {{borrowing}} was all we had available for 5 years -- that is, the shorter {{bor}} did not exist. Then {{bor}} was created 1 year ago and about 2/3 of entries of borrowed terms already use {{bor}} rather than {{borrowing}}. This is one reason why I see a trend towards shorter names, confirmed in the vote.

In the discussion, CodeCat suggested leaving shortcuts as shortcuts and long forms as long forms. Feel free to discuss this idea. I disagree with it: people who used the longer syntax {{borrowing|it|pizza|lang=en}} in entries from 2010 to 2015 did it because it was the only format available; once the shorter {{bor|en|it|pizza}} came to exist, people started to use it. --Daniel Carrero (talk) 16:01, 1 September 2016 (UTC)

Distinction between topical and context-based usage categories?[edit]

The general purpose of context labels is, as far as I can discern from what others have said, to specify the context in which a specific sense applies. Presumably, it is not understood in that sense in other contexts. However, there are a few systemic problems:

  • Context labels add categories that do not indicate this restricted context. Category:Physics, which is added when you put {{lb|xx|physics}} on a sense, has nothing to do with restricted usage. Instead, it's just a general category where all terms related to the topic of physics can go. As a consequence, some editors are led to think that context labels are just a fancy means of putting entries in topical categories.
  • Worse still, some context labels put entries into "set"-type categories, but display a topical context label. {{lb|xx|particle}} puts entries in Category:Subatomic particles while showing "physics". This is confusing when used on very widespread terms like electron, which are used far outside the "physics" context.

We already have "slang" categories, like those in Category:English slang, but we have none for jargon or restricted-context senses that are not slang. However, I think these are sorely needed. It is very valuable to distinguish senses used only in physics, from those related to physics. What can be done to remedy this situation? —CodeCat 20:25, 1 September 2016 (UTC)

I would favor using longer and more explanatory names for topical categories. I'll give a few examples. Feel free to suggest any changes.
"names of" (proper noun examples)
"names of" (place names -- subdivision(s) if they exist, country)
"names of" (common nouns) (are those acceptable?)
"relating to" (or "related to"?)
--Daniel Carrero (talk) 21:16, 1 September 2016 (UTC)
I've been "guilty" of using the context labels to categorize items, and don't agree with the current strict usage policy. The example given in WT:ELE is
{{lb|en|informal}} An [[informant]] or [[snitch]].
It says "Such labels indicate, for example, that the following definition occurs in a limited geographic region or temporal period, or is used only by specialists in a particular field and not by the general population". Informal language however is used by large parts of the general population.
Using category links to categorize is just very awkward, they're invisible and tend to be scattered around the wiki code, at the bottom of the page or somewhere else, and have maintenance problems (forgetting to remove the link when the definition is removed/changed). Conversely, labels are close to the definition, and if the label is removed then the category is removed as well. – Jberkel (talk) 11:12, 8 September 2016 (UTC)

For French Verbs: Displaying participles in the header[edit]

I'm copying what I wrote on the discussion page for {{fr-verb}}, as I forgot that Mglovesfun was no longer active:

Would it be easy enough to have {{fr-verb}} display in a way similar to {{pt-verb}} and {{es-verb}}? This would increase consistency between French and other languages on Wiktionary (including English, Spanish, Latin, and Portuguese), which would be a big plus. I would suggest including the present and past participles. I would do this myself, but I'm not very technologically inclined.... I would love to see it implemented, though! Andrew Sheedy (talk) 01:34, 2 September 2016 (UTC)

Mglovesfun is active. His username is Renard migrant.
I'm mildly in favor. It should be Luacized as we already have a module that generates most verb forms. And I can't do that, I'm afraid. Renard Migrant (talk) 17:26, 5 September 2016 (UTC)
I generally oppose copying inflection information from inflection tables. I prefer the format used by Dutch verbs (lopen) where principal parts are shown when the table is collapsed. —CodeCat 17:28, 5 September 2016 (UTC)
I suppose that's a workable option, but I would much prefer that all the Romance languages be consistent between each other , given their similar grammar, etc. Andrew Sheedy (talk) 17:59, 5 September 2016 (UTC)
What does any of this even mean? UtherPendrogn (talk) 18:31, 13 September 2016 (UTC)
Since I suspect you know what a participle is, I'd imagine your question is what would this actually look like:

faire ‎(present participle faisant, past participle fait)

ok? Renard Migrant (talk) 18:34, 13 September 2016 (UTC)
What I don't get is why. French present participles don't get used nearly as much as they do in English, Spanish, and Portuguese. --WikiTiki89 18:37, 13 September 2016 (UTC)

Proto-Celtic verb lemmas[edit]

@CodeCat, Victar, UtherPendrogn, Nayrb Rellimer, Florian Blaschke, and anyone else who cares: Right now we have only two Proto-Celtic verbs, *ber- (which uses the stem as the lemma) and *brusū (which uses the 1st person singular present as the lemma). Does anyone object to my settling on the 3rd person singular present as the lemma form for Proto-Celtic verbs? That's what we're already using for verb lemmas for Proto-Celtic's ancestor (Proto-Indo-European) as well as for its best attested early descendant (Old Irish). This would entail moving *ber- to *bereti and *brusū to *bruseti. Is that OK with everyone? —Aɴɢʀ (talk) 17:29, 2 September 2016 (UTC)

What is used as the lemma for modern Celtic languages? --WikiTiki89 17:34, 2 September 2016 (UTC)
The imperative for the modern Goidelic languages, the verbal noun for the modern Brythonic languages. —Aɴɢʀ (talk) 18:40, 2 September 2016 (UTC)
That seems a little strange, but then what do I know. In any case, I definitely support your proposition. --WikiTiki89 18:44, 2 September 2016 (UTC)
What about the old Brythonic languages? WT:Lemmas has nothing. —CodeCat 18:45, 2 September 2016 (UTC)
I know Welsh mostly descends from 3.sg. --Victar (talk) 18:57, 2 September 2016 (UTC)
I've been using the verbal noun for Middle Welsh, too, but I've been thinking it might be good to use the 1st person singular present (which is what the Geiriadur Prifysgol Cymru does for literary Welsh) and have the verbal noun be separate (as the verbal noun is separate for the Goidelic languages). —Aɴɢʀ (talk) 19:00, 2 September 2016 (UTC)
Symbol support vote.svg SupportCodeCat 17:37, 2 September 2016 (UTC)
Symbol abstain vote.svg Abstain On one hand, PCelt's descendants are mostly 3.sg, but on the other hand, it's nice to have it in line with Latin, who's descendants are also not in 1.sg. *shrug* --Victar (talk) 18:44, 2 September 2016 (UTC)
Is there any common practice in reference works (aside from the infinitive, which some dictionaries use for everything)? Chuck Entz (talk) 19:04, 2 September 2016 (UTC)
Sounds good. I've been working on some Proto-Brythonic verbs myself. My userpage has a huge amount of WIP translations. UtherPendrogn (talk) 19:18, 2 September 2016 (UTC)
I've created a rudimentary inflection table for thematic verbs, {{cel-conj-them}}. It's still lacking many forms, as I'm not super well versed on Celtic verbs. I'd like to know especially which principal parts there are and which PIE verb stems they come from. From w:Proto-Celtic language I gather that the present, future, preterite active and preterite passive stems are principal, but their PIE origin eludes me.
The template is implemented with a module, Module:cel-verbs, and new classes can be added there fairly easily. The main issue I'm faced with is the layout of the table. The table on w:Proto-Celtic language has a lot of wasted space, I'd prefer something more compact, but I'm not sure what would work best. —CodeCat 20:00, 2 September 2016 (UTC)
Symbol support vote.svg Support I don't think there's an established practice (Schumacher, for one, uses only stems), but considering Old Irish uses the 3sg too, it makes sense. I'm generally a fan of using the 3sg because it is usually the most frequent and best attested form, and in certain verbs (such as meteorological or impersonal verbs), other forms will be rare at best (though not necessarily nonexistent: for example, in the Old Lithuanian corpus a verb form like "I snow" may be attested in the context of a tale with anthropomorphised clouds). --Florian Blaschke (talk) 01:08, 3 September 2016 (UTC)
I'm a little late, but I Symbol support vote.svg Support moving them to the 3sg. For what it's worth, Matasovic also only gives stems. —JohnC5 14:50, 7 September 2016 (UTC)
  • OK, I've gone ahead and moved all the verb pages (there were only four) to the third-person singular present indicative form. —Aɴɢʀ (talk) 14:46, 5 September 2016 (UTC)

I've been working on a new verb conjugation table. Please let me know what you all think. User:Victar/Template:cel-conj-table --Victar (talk) 02:40, 7 September 2016 (UTC)

I don't think it's an improvement over the existing one. —CodeCat 12:11, 7 September 2016 (UTC)
That's seems certainly to be a tainted matter of personal opinion. --Victar (talk) 15:31, 7 September 2016 (UTC)
I definitely would not use MacBain's dictionary for anything. It's hopelessly out of date now, and wasn't all that up to date even when it was published. —Aɴɢʀ (talk) 13:54, 7 September 2016 (UTC)
Did he get everything right? Obviously not, but you cite the classic along with the modern. It's still a work in progress. --Victar (talk) 15:31, 7 September 2016 (UTC)

Please vote in "Poll: Description section"[edit]

Please vote in Wiktionary:Beer parlour/2016/August#Poll: Description section.

Current winners:

  • "Description" = 3 actual support votes
  • "Shape" = 2 actual support votes (my vote is calling it second best) + 1 vote in favour of this section "if we do have it" in the Oppose section.

If enough people prefer "Shape" instead of "Description", I can change the whole vote Wiktionary:Votes/2016-08/Description before it starts: it would become a vote for having a "Shape" section.

If more people prefer "Description" instead of "Shape", it would confirm that the vote can start as-is.

The current results are basically a tie with my "second best" comment weighing a bit in the direction of supporting "Description". If nobody else participates on the poll, I think I'll start the vote as-is. --Daniel Carrero (talk) 14:43, 5 September 2016 (UTC)


The following needs to be posted on WT:NFE:

* [[Module:IPA]] and {{temp|IPA}} now support an additional <code>qual''N''=</code> parameter, to place a qualifying note before a pronunciation.

CodeCat 20:28, 5 September 2016 (UTC)

Yes check.svg Done --Daniel Carrero (talk) 20:31, 5 September 2016 (UTC)


Can I have my sysopship back please? It's getting very frustrating not being able to properly patrol or edit protected pages. I also ask for Module:links, Module:th and Module:th-translit to be restored to the version that puts the transliteration code in Module:th-translit (where it ought to be) rather than Module:links, and ask that this be enforced by all editors. There are currently negotiations for a vote for Wyang's proposal, so it would be inappropriate for him to restore his version and continue the edit war before a vote on the matter has been held. —CodeCat 20:42, 5 September 2016 (UTC)

For the record, negotiations are happening at Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations and the vote talk page.
I support giving back the tools to CodeCat, and to Wyang too. I support restoring modules and templates to the previous version. Whatever the merits of having two separate romanizations (I might even vote support!), I believe the status quo should prevail and that the new proposal should be properly discussed before implementation, especially in case of a huge disagreement like the one that we have now. --Daniel Carrero (talk) 20:48, 5 September 2016 (UTC)
Agreed And this also may be a good reason to implement Template Editor privileges here. —Justin (koavf)TCM 21:55, 5 September 2016 (UTC)
Support Why was CodeCat ever desysopped? --Florian Blaschke (talk) 22:20, 5 September 2016 (UTC)
There are two things that have to happen before I restore sysop rights:
  1. There has to be support from the community for it. This has been trickling in, and probably won't be a barrier.
  2. I have to be convinced that both parties will refrain from any actions that might start the edit war again.
The negotiations at Wiktionary talk:Votes/2016-08/Enabling different kinds of romanization in different locations are a start, but they mostly consist of some variant of "what about this?", followed by some variant of "you're not getting my point". We need to get beyond talking past each other and start talking about serious proposals. We also need to avoid dwelling on past behavior and start discussing what the future is going to look like. Chuck Entz (talk) 22:30, 5 September 2016 (UTC)
FWIW I am OK with restoring sysop privileges, provided both Wyang and CodeCat agree not to resume edit warring. I also think that Module:links should be restored to the status quo ante, with an appropriate vote to resolve the matter. In fact I asked Dan to create this vote in order try to resolve what I thought was the root of the conflict between CodeCat and Wyang. As it happens, Wyang has objected to the vote for various reasons, some of which concern whether the issue of the vote is the right one to be voting on and some of which object to having a vote at all. The amount of contention here indicates we clearly need a vote but I'm open to rewording it. However, this issue is orthogonal to the issue of sysop privileges. Benwing2 (talk) 22:32, 5 September 2016 (UTC)
My only concern is the restoration of existing practice to the Thai transliteration module, and the elimination of custom code from Module:links. If that is accepted then there won't be any edit warring from me, though I do ask what course of action I should take if Wyang restores his version of the modules without a vote to support it. The reason the edit war happened in the first place was because Wyang kept reverting me and no steps were taken to stop him, and he ignored all attempts I made to convince him to stop and wait for consensus/vote. So if Wyang is sysopped again, there needs to be a contingency plan in case he does the same again; some kind of guarantee that others will also step in instead of just me. —CodeCat 22:42, 5 September 2016 (UTC)
Translation: You want us to take your side on the edit war and enforce it for you. I happen to prefer your version, but this kind of talk isn't very helpful. Chuck Entz (talk) 23:27, 5 September 2016 (UTC)
Pretty much, yes. The alternative would be endorsing Wyang's edits without a vote to show such endorsement by the wider community. That doesn't seem like a proper option given how contentious the issue is. Major changes that are contentious should be voted on, yes? —CodeCat 23:55, 5 September 2016 (UTC)
(edit conflict) One part of the problem is figuring out exactly what the status quo ante would be: this started when Wyang added his code to Module:links to implement a very useful change for Thai transliterations/romanizations. CodeCat later extensively reworked the module, in the process removing the code (I'm not sure whether she noticed the code or recognized what it was at the time). This broke a number of Thai entries and several Thai editors asked what was going on, so Wyang added the code back. It's possible that CodeCat, if she was unaware of the earlier code, thought this was something entirely new- she certainly acted as if it were. She reverted his edit, and didn't handle the dispute very well. Wyang got upset and the edit war started. Wikitiki89 came up with a compromise that moved the code out of Module:Links, which CodeCat adopted, but Wyang didn't.
Do we revert it to:
  1. The state before Wyang's first edit? That would wipe out CodeCat's reworking of the module.
  2. The state before Wyang's second edit? (Dan Polanski's choice, if I understand correctly). That would break a number of Thai entries.
  3. The state after Wyang's second edit? (Wyang's choice)
  4. The state after Wikitiki89's edit? (CodeCat's choice)
The last two are the only ones that don't break anything, and either could be considered the status quo ante, depending on how you interpret Wyang's first edit. Chuck Entz (talk) 23:15, 5 September 2016 (UTC)
I don't see any point in restoring anyone's admin rights until the substance of the disagreement is resolved. As I see it, the destructive turn the conflict took is a serious matter, affecting important core software. If the talent involved in the matter cannot resolve it, perhaps someone else should. DCDuring TALK 23:44, 5 September 2016 (UTC)
There's already a vote that attempts to propose Wyang's changes so that a formal consensus can be made. But Wyang doesn't seem very cooperative in formulating the proposal, so it's mostly stuck. Since Wyang thus has no consensus for his proposed reinterpretation of transliteration modules, the status quo remains, which is that transliteration modules provide any kind of romanisation deemed desirable. This is what my and Wikitiki's edits attempted to do. If Wyang does not agree to a vote but forces his own interpretation through edit warring, what can be done? —CodeCat 23:59, 5 September 2016 (UTC)
@Chuck Entz: Hmm, when I wrote my comment I didn't check out the whole history carefully. Since the argument is about the presence or absence of a particular piece of Thai-specific code in Module:links, and if I'm not mistaken this didn't exist before the whole edit war started, then logically the status quo ante shouldn't include it. However, I don't completely understand the ramifications of this. Wyang obviously put the code there for a reason; but CodeCat and Wikitiki seem to believe that the same functionality can be achieved with this code in Module:th-translit. If this is true, then it should be taken out pending a vote to decide the underlying issues. Benwing2 (talk) 00:19, 6 September 2016 (UTC)
The reason the code was placed there by Wyang is because he believes that transliteration modules should only transliterate strictly: character by character. He therefore objects to the modification Wikitiki made, but at the same time, his reinterpretation of transliteration modules is not the agreed status quo. I argue that under the consensus interpretation, a vote is necessary for Wyang's proposal to restrict transliteration modules to just strict transliteration, and have an alternative module system/infrastructure for non-transliterative romanizations. I also believe that under this interpretation, the Thai transliteration code should be placed in Module:th-translit until a vote shows consensus to the contrary. And additionally, even if a vote passes to have separate infrastructures in our modules for transliteration and other types of romanization, the specific code for Thai does not belong in Module:links, but should be handled by said proposed infrastructure in a more general manner. —CodeCat 00:34, 6 September 2016 (UTC)
There was no consensus. What is being repetitively cited as "consensus" is how people perceive romanisations from the angle of languages not making such a distinction. Truth is, appropriate and purpose-oriented romanisation has been the norm in languages with a script-pronunciation discordance, and it has been the consensus for these languages. See for example the differential use of transcriptions and transliterations ({{ko-etym-native}}) in 미끄럽다 (mikkeureopda), by User:Visviva who created the bulk of our Korean entries. The core issue is “why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages”, and the conclusion from the previous discussion is: "the envisageable harm is minimal and benefits are extensive". There is a demonstratable need to maintain the systems separate - our language editors routinely apply different romanisations when editing these languages, and printed dictionaries of these languages show that authors regard that the different modes of romanisation are suited to different purposes. The issue is not whether we should implement use romanisation X in translations right now; the issue is whether the system should be maintained to take this need into consideration and not deliberately confuse the concepts "transliteration" and "transcription" (where they truly make a difference), so that future edits in these languages are not discouraged. Wyang (talk) 03:20, 6 September 2016 (UTC)
What happens now? —CodeCat 19:57, 9 September 2016 (UTC)
This is up to Chuck. I'm not sure where things stand currently. Benwing2 (talk) 16:17, 11 September 2016 (UTC)

Proposal: Redirect all halfwidth and fullwidth forms to their "normal" counterparts[edit]

When there are fewer active votes in the list, I'm thinking of creating a new vote for this proposal:

Redirect all halfwidth and fullwidth forms to their "normal" counterparts.

I feel this should be pretty uncontroversial, but let me know if someone has a reason to keep the halfwidth and fullwidth forms.

Previous discussions:

--Daniel Carrero (talk) 00:59, 8 September 2016 (UTC)

I have a minor objection: Why are single-character half-/full-width forms more important than words spelled with them? We obviously shouldn't duplicate all our entries in half-/full-width forms, so if we can get away without those, why can't we get away without the single-character ones? --WikiTiki89 14:01, 8 September 2016 (UTC)
Actually, CD was a redirect since 2013; I deleted it now. I agree with you about fullwidth words. I believe we don't want entries like  CD, LCD or bye bye, or even redirects like CDCD, LCDLCD, bye byebye bye. But I feel that the possibility of readers searching for single fullwidth characters is higher than for words. If a person searches for "CD" and finds out that we don't have that entry, they might try searching for " C" afterwards.
According to the pageview tool (link) the fullwidth entry got 197 views in the last 6 months. Halfwidth got 12 views. It's not a terribly huge number, but I feel a redirect to the normal forms wouldn't hurt.
In general, for any redundant Unicode characters, I feel it's good to have redirects from the alt form to the "normal" form. Based on that sentiment, I created Wiktionary:Votes/2011-06/Redirecting combining characters and Wiktionary:Votes/2011-07/Redirecting single-character digraphs. Both passed, in 2011.
For better communication, I should probably create a vote with the whole idea that I have in mind. "Voting on: Allowing all single-characters full- and halfwidth forms as redirects. Forbidding full- and halfwidth words, they should not exist even as redirects." --Daniel Carrero (talk) 17:06, 8 September 2016 (UTC)
Actually, I think the problem with many of your proposals is that you create a vote too soon. We should have a long discussion first and only after the discussion has died down and some time has passed should you create a vote (if there had been enough support). --WikiTiki89 17:23, 8 September 2016 (UTC)
Good point. But you can't always have a long discussion: sometimes, nobody, or just a few people, respond to my topic on the BP. If nobody else decides to weigh in on this topic about fullwidth characters, I believe I should create the vote anyway (eventually).
Concerning minor proposals that don't affect a lot of entries (I consider "redirect fullwidth characters, disallow fullwidth words" one of these) and minor policy edits that don't change actual regulations, I think it's okay to start a vote earlier than most other votes. But if creating votes too soon is a problem, I guess I could create a vote after the discussion disappears from the main Beer parlour page. Other proposals were discussed a lot (sometimes in multiple places) before the vote started. If you want, we can talk about specific past votes that I created, to see if I could have done any of them differently.
Then again, there are some proposals that were discussed already but I didn't create a vote for them. I see nothing wrong with creating a vote immediately for some of these, and pointing to the previous discussions. I may even create a new BP discussion just to point out that a new vote was created, and to see if everyone agrees with the wording of the vote. This is not the same as creating a new vote without discussion. --Daniel Carrero (talk) 18:13, 8 September 2016 (UTC)
I'll give you two rules of thumb: If the discussion is still going, it's too early to create a vote (unless it's an urgent matter). If the discussion has not had much input, try to attract more attention to it, or perhaps it is not important enough to be voted on. --WikiTiki89 18:20, 8 September 2016 (UTC)
All right, I'll have this in mind: "If the discussion is still going, it's too early to create a vote (unless it's an urgent matter)."
I partially agree with this: "If the discussion has not had much input, try to attract more attention to it, or perhaps it is not important enough to be voted on." In my opinion, the proposal "redirect fullwidth characters, disallow fullwidth words" is important enough to be voted on and appear on the WT:CFI as actual criteria for inclusion/exclusion of entries, but among the things that need to be voted on, this is not very important, because it affects few entries. --Daniel Carrero (talk) 19:04, 8 September 2016 (UTC)


FYI, September 19 is International Talk Like a Pirate Day. I would suggest doing the word-of-the-day as something pirate-related if possible. I think it would be great too if we can create an Appendix or Category of terms traditionally associated with pirate lore, such as "walk the plank." I know in my area (Maryland, USA) there are local businesses offering promotional discounts for customers who come in talking like pirates on September 19. I think a pirate vocabulary guide would be helpful not just for them, but for authors and storytellers as well. Nicole Sharp (talk) 05:24, 8 September 2016 (UTC)


@CodeCat, Victar, UtherPendrogn, Nayrb Rellimer, Florian Blaschke, Anglom, Angr, Chuck Entz, and anyone else who cares:

Several books on Brittonic and Neo-Brittonic suggest that the name Gwydion was "Uidgen" or "Widgen" at this point in time, not Gwidyen, as here https://en.wiktionary.org/wiki/Reconstruction:Proto-Brythonic/Gw%C9%A8d%C9%A3en . Indeed, the "gw" shift seems to have happened from NB to Old Welsh, where it became Guidgen, then in Middle Welsh Gwydyen/Gwydyon and modern Gwydion. UtherPendrogn (talk) 19:01, 9 September 2016 (UTC)

Attestations at *gwir show that the change happened in all languages and is thus of Proto-Brythonic date. —CodeCat 21:24, 9 September 2016 (UTC)
Good. As to the name https://en.wiktionary.org/wiki/Reconstruction:Proto-Brythonic/Kadwall%E1%BB%8Dn , have I reconstructed it correctly? Some of the descendants are messy, I'm sorting them out right now. UtherPendrogn (talk) 22:02, 9 September 2016 (UTC)
The Irish descendants don't match up. Where did they get their -m-? It looks much more likely that they descend straight from the Proto-Celtic form, which was *Kat(u)wellamnos or similar. Gaulish is not a Brythonic language, it got its form of the name straight from Proto-Celtic. —CodeCat 22:11, 9 September 2016 (UTC)
I can also find nothing whatsoever of the Gaulish or Irish names, Google gives zero results. @Angr can you check this? —CodeCat 22:15, 9 September 2016 (UTC)
I got some Google Hits for the Gaulish name and variants of it, e.g. this, but it seems to be a place name rather than a personal name. I can find no trace of an Irish form "Cathfollomon". Cadwallon reconstructs Proto-Brythonic *Katuwellaunos, which in our notation would be *Kaduwellọn, from Proto-Celtic *Katuwelnāmnos. The Brythonic Catuvellauni have the same name. —Aɴɢʀ (talk) 06:28, 10 September 2016 (UTC)
Using Au or the dotted O is a matter of notation, but surely apocope is not? And why did you mention the forms? They are the ones I put.

EDIT: Oh I see now, sorry, I put the Early Brythonic form rather than the Proto-Celtic. Will rectify that and add the PC form.UtherPendrogn (talk) 12:18, 10 September 2016 (UTC)

It does not matter what form the descendant takes, surely? And I reconstructed the Irish ones thanks to the Dictionnaire de la Langue Gauloise by Xavier Delamarre. UtherPendrogn (talk) 12:22, 10 September 2016 (UTC)
Sorry, why are there Goidelic descendants under *Kadwallọn? —JohnC5 17:11, 10 September 2016 (UTC)
That shouldn't be there. Probably a mistake from copy/pasting the Celtic form. Removed now. UtherPendrogn (talk) 19:30, 10 September 2016 (UTC)

Minor edit in WT:EL § interwiki links[edit]

If no one objects, I'll remove "and are listed in the left hand side of the entry" from WT:EL#Interwiki links. Some people complained about it in Wiktionary:Votes/pl-2016-02/Interwiki links, which passed in March 2016. I'd like to do this without a new vote.

Current text: "Interwiki links are used to point to the same word in foreign language Wiktionaries, and are listed in the left hand side of the entry. To point to the page palabra in the Spanish Wiktionary, use:"

Proposed text: "Interwiki links are used to point to the same word in foreign language Wiktionaries. To point to the page palabra in the Spanish Wiktionary, use:"

--Daniel Carrero (talk) 11:47, 10 September 2016 (UTC)

Yes check.svg Done. Let me know if you wanted the mention of the "left hand side of the entry" back. In the vote, a few people were not very happy with that wording. --Daniel Carrero (talk) 02:14, 14 September 2016 (UTC)

Centralization of also-information[edit]

For some time, I thought it would be good to entralize the {{also}} lists in a canonical entry, which would be the diacritic-free lowercase entry if available. The canonical form entry would have the full list while each other form would only link to the canonical entry using {{also}}. For instance, kaca would have a full list while káča would only link to kaca. This would remove a maintenace overhead while bringing only a minor incovenience to the reader; it would also make the tops of many pages less busy.

Does anyone like that idea? --Dan Polansky (talk) 09:03, 11 September 2016 (UTC)

I wouldn't object; I rarely need to see pages with accented titles, however. The obvious alternatives are (i) to have a bot regularly update the alsos (or even a template generate them on the fly?) based on a list of entry titles, or (ii) to use the Variations of __ pages like Appendix:Variations of "be" (but that's an extra click, and a waste of a page when there are very few variations). Equinox 10:30, 11 September 2016 (UTC)
See above discussion of updating contents of uses of Template:also.
If we "centralize", I would prefer that only one (or more) page(s) whose headword(s) had diacritics bore the complete list of headwords in the equivalence class. DCDuring TALK 12:05, 11 September 2016 (UTC)
The problem is that the average reader isn't going to click on the undiacriticed form if they don't see their diacriticed form there. Of course, most people are going to search the undiacriticed form to start with, but their system may have easy ways to type accents, but not macrons, háčeks, etc., so you can't rule the possibility out. Chuck Entz (talk) 14:39, 11 September 2016 (UTC)
This is true. On my German keyboard, it's easy to type â ê î ô û but not ŵ ŷ, so if I'm searching for a Welsh word with a circumflexed vowel, I'll search for the diacriticked form of the first five but the undiacriticked form of the last two. All that said, however, I'd prefer to keep the full list on each page, because you just never know where you're going to end up. —Aɴɢʀ (talk) 15:43, 11 September 2016 (UTC)
  • The current system is better than this proposal, and its main weakness can be solved by having an also-bot continuously updating. —Μετάknowledgediscuss/deeds 17:47, 11 September 2016 (UTC)
I don't think this is a bad idea, but it seems like it would be necessary to have a bot keep things updated no matter if we keep things updated on all pages (checking for new entries that have been created and need to be added to all the {{also}}s) or on one page (still checking for new entries to add to the centralized list, and for any additions of also to peripheral pages, which the bot would presumably remove). Given that, I do think the idea of having a bot update all the {{also}}s is better. Someone just needs to design and run that bot...! - -sche (discuss) 19:01, 11 September 2016 (UTC)
I thought @Isomorphyc had previously volunteered in the discussion above. I don't know whether he has all the skills, but he does run Orphicbot. DCDuring TALK 19:35, 11 September 2016 (UTC)
I think this would need 2 separate templates:
  • caca would have: "See also: Caca, caça, caçà, cáca, căca and ćaća" ({{also}} as usual)
  • Caca would have: "For more entries, see caca" ({{also-more|caca}} or something)
  • caça would have: "For more entries, see caca"
  • caçà would have: "For more entries, see caca"
  • etc.
@Dan Polansky, DCDuring, Metaknowledge: Thanks for pinging me. Actually, I have the code already to do most of this, including realtime updating. The only thing I haven't totally worked out is how I will handle the appendices. It turns out there are a variety of corner cases where users have entered more information into an {{also}} template than one would want, by default, to add, for example, transliterations into other scripts. My current policy has been to retain these where they have been entered, but not to propagate them to other entries. Because of this, centralising the lists will remove the potential for this type of user-generated information. To retain flexibility, my suggestion would be not to centralise the data. I would add that I believe every method for storing this data in modules has significant drawbacks.
For the issues about typing ease raised by User:Chuck Entz and User:Angr, I think users would learn to seek out the {{also}} templates if they were consistently available. I'll test this with the pageview data three months after I have updated to templates to see if an increase in newly linked words with diacritics is seen in aggregate. But I would point out this only partly solves the typing problem because if a word with diacritics has no corresponding entry in pure ASCII, there will be no also template in the easy-to-type location. I have looked at a few newer methods of improving ASCII searchability than which I have tried so far, but that is a different topic, and everything I have looked at has drawbacks. Isomorphyc (talk) 20:19, 11 September 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── The horse may already have bolted from the stable, but is it really necessary for OrphicBot to add alternative forms of an entry in an {{also}} template when they are already listed under the "Alternative forms" section? I think that isn't very useful; {{also}} should perhaps be confined to accented forms (usually in other languages) or differently capitalized forms. — SMUconlaw (talk) 11:22, 12 September 2016 (UTC)

@Smuconlaw: Sorry for giving that impression. It did a little, but these are orthogonal changes because centralisation can be accomplished in the modules and templates without editing the pages. If the data are centralised, the important thing is only to have the actual template on each page; the arguments do not matter. I'll be glad to make the changes if necessary; they're not major. For your question: Wiktionary's normal principle is redundancy over normalisation, largely for reasons stemming out of the fact that we're not a database. Creating inter-template dependencies is not a good idea if it can be avoided. In this case, the exception you propose would be also confusing to users because each lemma in each language can have its own "Alternative forms" section, and the user would need to find the correct one out of potentially many. Moreover, the `correct' one may not even be in the language the user is expecting, defeating the purpose of a purely orthographical index. That said, if this or other exceptions are generally preferred I will implement them. For example, User:YURi has suggested omitting {{also}} links from misspellings to correct spellings, since that this is also redundant. Isomorphyc (talk) 14:57, 12 September 2016 (UTC)
Thanks for explaining. It's not really a big deal for me, but I was wondering whether it made sense in some cases to have both an "Alternative forms" section and the very same information in a "See also" statement at the top of the entry. — SMUconlaw (talk) 17:44, 12 September 2016 (UTC)

Matched-pairs — policy page[edit]

I created Wiktionary:Votes/pl-2016-09/Matched-pair entries — policy page, to implement what was discussed in Wiktionary:Beer parlour/2016/June#Redirects to matched pairs. Feel free to discuss and propose any changes. --Daniel Carrero (talk) 14:04, 11 September 2016 (UTC)

Allow for easier input from the laity.[edit]

I recently saw on TV an "educational" program that referred to an 'oyster knife' as a 'paring knife'. This inspired me to look up the term 'Shucking Knife' because this is what I have always called an 'oyster knife'. When I discovered that Wiki did not have a page or a link for 'shucking knife' I was confronted with the overly convoluted requirements that Wiki has in order to let you know that I am aware of a synonym for one of your terms. I had to 'think' much too hard.

Yes you're right it is a difficulty and yet we need to have some sort of minimum standard as well. Shucking knife definitely exists but if you look at shucking and shuck, shuck says '[t]o remove the shuck from (walnuts, oysters, etc.).' which makes me thing it's possibly just a knife for shucking, in the same way that a whittling knife is just a knife for whittling, and therefore does not need an entry. But in general use, being accessible for new editors while trying to maintain consistency throughout our format is a challenge, there's no two ways about it. Renard Migrant (talk) 22:42, 11 September 2016 (UTC)

Restoration of Sysop Privileges[edit]

Given the amount of time with no action on the disputed issue, I'm prepared to restore sysop privileges to @CodeCat and to @Wyang if they will commit to not editing Module:links except for changes both agree to beforehand, at least until both agree that the conflict is resolved.

Please state here whether you agree to this. Thanks! Chuck Entz (talk) 23:58, 11 September 2016 (UTC)

Can someone else make the changes, then? If neither of us is allowed to edit it, that implies that there is a consensus for Wyang's preferred version. The reason I continue to press this is because I fear that if I don't, nothing will be done about it yet again. —CodeCat 01:39, 12 September 2016 (UTC)
@CodeCat, maybe you could provide a link to the exact revision of the module which you would say is the correct status quo? --Daniel Carrero (talk) 01:45, 12 September 2016 (UTC)
[15], [16], [17]. These three revisions ensure that the Thai transliteration code is placed in the Thai transliteration module where it belongs (according to the current consensus on treatment of transliteration modules), rather than in Module:links where it does not belong. —CodeCat 01:49, 12 September 2016 (UTC)
Do other people agree with reverting the modules to these exact versions?
I'll repeat what I said in another discussion:
  • I support restoring sysop privileges to both CodeCat and Wyang.
  • I support reverting the modules to the status quo, and in the face of this huge disagreement, I urge @Wyang to help in the creation of the vote before implementing any new proposal.
Correct me if I'm wrong: I seem to remember that some entries were already edited based on Wyang's system and reverting the modules to the status quo would break the entries. Still, IMO the status quo should prevail and the entries should be fixed. --Daniel Carrero (talk) 02:03, 12 September 2016 (UTC)
I also support restoring sysop privileges to both CodeCat and Wyang. In addition, I support restoring the modules to the status quo. Unfortunately, as Chuck pointed out, it's not totally obvious what this is, but in my mind, since the edit war specifically concerned references to Module:th in Module:links (+ supporting code), and since the references to Module:th weren't present in the module beforehand, the status quo should not include them: Specifically, it shouldn't include Module:th, 'phonetic_extraction' or the code that references 'phonetic_extraction'. Benwing2 (talk) 02:50, 12 September 2016 (UTC)
Back then there wasn't even any automated romanisation for Thai; restoring the previous version would simply wipe out the romanisations in thousands of Thai entries. I'm really confused. There was no consensus for CodeCat's edit, despite her claiming there is. I was only adding in transcription support at Module:links (which was lacking transcription support) per the consensus of the Thai editors, in a manner that is most appropriate for further editing in Thai and other similar languages. If you do not agree, voice your arguments other than voicing “I don't like it”! I spent so much effort arguing for why storing transcription and transliteration modules separate is beneficial in the long run, and what I got was non-participation and the indifferent “so what happens now?” (1, 2). Decision-making should not be like this - having people voice their opinions without having a critical appraisal of the arguments for and against makes the decisions arrived at highly prone to unintelligence. It shouldn't be the case that you can say your preference and expect it to be enacted without giving a reason. Why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages, when our language editors routinely apply different romanisations when editing these languages, and printed dictionaries of these languages show that authors regard the different modes of romanisation as suited to different purposes? If it cannot be demonstrated that the harms do outweigh the benefits for these languages and there is no willingness to demonstrate, there is no justification for enacting this opinion or restoring the “previous version” which abolishes the functionality altogether. Wyang (talk) 03:54, 12 September 2016 (UTC)
(edit conflict) We're trying to achieve a compromise here. In my book, adopting a version more heavily weighed against one side than the other side even asked for isn't a compromise. What you're asking for basically breaks a large number of Thai entries that were modified in good faith by the Thai community after Wyang provided the capability for it with his first edit. Regardless of how things are going to end up eventually, that's too much collateral damage to make it a reasonable first step toward a compromise. Remember the story of how Solomon pretended he was going to cut a baby in half in order to see from the reaction of the two claimants which was the real mother? This is like cutting the baby in half first. Chuck Entz (talk) 12:41, 12 September 2016 (UTC)
So, over at the Grease pit, @Vahagn Petrosyan had mentioned that many languages require both transliteration and transcription. Do we think that the inclusion of both, if the transcription differs, could kill two birds with one stone? —JohnC5 17:05, 12 September 2016 (UTC)
That's what Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations is supposed to address. But it's not going anywhere. —CodeCat 17:13, 12 September 2016 (UTC)
@Wyang: I have a question for you, and I'm sorry if you already explained it somewhere. I'm going to ask anyway: Given the benefits about your proposal that you explained, don't you think that Wiktionary:Votes/2016-08/Enabling different kinds of romanization in different locations has a good chance to pass? More importantly, is the linked vote satisfactory for you, or would you change something in the proposal? --Daniel Carrero (talk) 04:03, 12 September 2016 (UTC)
@Daniel Carrero: I believe the answer to your question is on the vote's talk page. —suzukaze (tc) 04:05, 12 September 2016 (UTC)
OK, but Wyang may still choose to help building the vote. If the vote explains the proposal correctly and passes, it will mean we are all on the same page and understand the implemented proposal.
In the previous discussion, Chuck Entz presented a few possible versions of the status quo to choose from. Is anyone interested in discussing what exactly is the right one? If no one objects, I'll just trust CodeCat and revert the three modules to the revisions that she mentioned. --Daniel Carrero (talk) 10:04, 12 September 2016 (UTC)
Why? I have explained the reasons of my objection well enough above, and in the previous discussions. Why do the harms outweigh the benefits if we keep the transliteration and transcription modules separate for these languages, when there is ample evidence suggesting the contrary? Nobody was interested in engaging in discussion to argue for the version that you are trying to restore. Why is reverting to a version which cannot be justified even being considered? Wyang (talk) 11:35, 12 September 2016 (UTC)
Please understand: It's not about whether the proposal is good, it's about whether other people agree with it, and are on the same page. That's why some of us are interested in having a vote, which would explain and record the proposal, and let others judge its merits. To put it another way: if the proposal is really good, the vote is probably going to pass and we'll do exactly as you proposed. --Daniel Carrero (talk) 11:57, 12 September 2016 (UTC)
We haven't had votes on the architecture of the modules, so I don't see what makes the "status quo ante" Wyang so sacred. If Wyang took the initiative to overcome a language(s)-relevant limitation of the module architecture, it seems to me that it merits our respect. If our architecture doesn't provide the required flexibility without some kind of kludges, so much the worse for the existing architecture. In this and on many other matters I favor accommodating decentralized decision-making. DCDuring TALK 12:33, 12 September 2016 (UTC)
Wyang's changes don't do anything that could not be achieved within our existing module framework. The three edits Wikitiki made to the modules, and which I proposed they be restored to, show that. The only reason he did it is because he doesn't like the framework (specifically, that transliteration modules do other kinds of romanization too). Therefore, I proposed that if he doesn't like our current consensus on what transliteration modules do and how they are used in other modules/templates, he should make a vote to change it. So far he hasn't shown any interest. Most of what has happened since then is several editors trying to get Wyang to cooperate on formulating a vote, while Wyang himself is skirting around the issue and avoiding a vote. Is this appropriate behaviour when someone's changes have been challenged? And would it be appropriate to allow said changes to remain in place when they have been challenged so heavily and the user is not prepared to let the community decide per vote on the issue? —CodeCat 13:56, 12 September 2016 (UTC)
As I said above, the only point revolved around in the “no”-camp is “I don't like it”, without any explanation given. Why do the harms outweigh the benefits if we keep them separate in these languages, when there is ample evidence suggesting the contrary? You keep citing your version as consensus, but where is the vote showing that? Using purpose-suited romanisation is the consensus for languages with a transcription-transliteration distinction ({{ko-etym-native}}, etc.). If you do not like this practice, you should bring this up in a discussion and explain your reasoning, aside from saying “I don't like it”. There is no point blaming the implementer for implementing what was already a custom in languages you are not involved in, and barring the improvement in the module infrastructure for these languages. Wyang (talk) 22:55, 12 September 2016 (UTC)
As I said before, there is no "no"-camp, just people that you need to convince. The burden of proof is on you. Once that's done, the vote should be able to pass. We are repeating the same arguments over and over. This discussion is going nowhere. I reverted the three modules to the revisions chosen by CodeCat. Feel free to discuss if I should have done something different. --Daniel Carrero (talk) 23:07, 12 September 2016 (UTC)
I reverted the edits I could revert. Discussion is still ongoing; you cannot voice your opinion and expect it to be enacted without justifying it. Any unilateral measure taken constitutes disrespect to the participants of discussion. Wyang (talk) 23:14, 12 September 2016 (UTC)
"you cannot voice your opinion and expect it to be enacted without justifying it" ... ha! I see some irony there, and it's amusing. But it may be just me. Seriously, if I did something wrong please someone step up and say what to do. I restored the modules again. --Daniel Carrero (talk) 23:21, 12 September 2016 (UTC)
You are insane. You did not even know what the contention was, and yet you feel empowered to trample on whatever modules you can get your hands on simply because you can. Wyang (talk) 23:28, 12 September 2016 (UTC)
Good grief. The diff you linked to does not indicate that I'm completely clueless about the contention. It does indicate that I was politely asking you for your opinion on the best way to word a vote. --Daniel Carrero (talk) 23:34, 12 September 2016 (UTC)
Asking for my opinion on the best way to word a vote... when it should not be relayed to a vote at all, because there is no argument input from people arguing we should confuse transliteration and transcription. There are numerous arguments for keeping the modules separate being put forth in the discussion, such as (1) our editors in these languages already implement the practice of using purpose-suited romanisation; (2) printed dictionaries in these languages use differential romanisation and deem the different modes of romanisation as suited to different purposes; (3) it conforms to existing language-specific module infrastructure developed for these languages; (4) it is prospectively designed, and does not discourage further improvements in these languages. But the arguments against? One: "I don't like it". It is unfair to use a vote to end a discussion, when one side is only interested in expressing their opinion and not giving any rationales for it. It is facilitating mindless decision-making. Wyang (talk) 23:46, 12 September 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── You have failed to provide an accurate view of the opinions of other people. "But the arguments against? One: "I don't like it". is a straw man.

Could you please change your mind and be willing to cooperate in the vote? We could add your 4 points in the rationale. --Daniel Carrero (talk) 23:53, 12 September 2016 (UTC)

If I have failed to provide an accurate view of the opinions of other people, then could you please list the arguments against? We are still at a stage in the discussion where we are struggling to list any arguments from one side. This is way too immature to call on votes. Votes are evil. It allows such disproportionate argumentation to be easily distorted to produce an unintelligent consensus for the reason of sheer numbers only. Wyang (talk) 00:57, 13 September 2016 (UTC)


User:Daniel Carrero completely ignored the discussion and proceeded to revert the modules to a version he prefers and locked the modules. This is unacceptable bullying behaviour and shows no consideration for the rules of discussion.

(cur | prev) 23:16, 12 September 2016‎ Daniel Carrero (talk | contribs)‎ . . (138 bytes) (-46)‎ . . updated since my last visit (thank)
(cur | prev) 23:15, 12 September 2016‎ Daniel Carrero (talk | contribs)‎ m . . (184 bytes) (0)‎ . . (Protected "Module:th-translit" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))) (thank)

I urge other admins to please look into this abuse of power and take actions. Wyang (talk) 23:19, 12 September 2016 (UTC)

I don't see bullying going on. I see you refusing to coöperate with him when he seeks to help you resolve this dispute, however. It's easier to decry alleged abuses of power, but the right thing to do is work on moving forward. —Μετάknowledgediscuss/deeds 23:31, 12 September 2016 (UTC)
You might deliver that message to Dan. It certainly seems high-handed to me. DCDuring TALK 23:33, 12 September 2016 (UTC)
It's not bullying, but he shouldn't have done it. Fortunately Anatoli reverted the edits so I didn't have to.
When emotions are high is exactly the wrong time to take such actions- it's just throwing gasoline on the fire. Besides, they were completely out of process and I just don't see the consensus to act now. Chuck Entz (talk) 01:41, 13 September 2016 (UTC)
All right. --Daniel Carrero (talk) 01:46, 13 September 2016 (UTC)

No Middle Danish?[edit]

It seems we do not have categories, language codes or anything for Middle Danish. Does Wiktionary subsume Middle Danish under Danish, and if so, why? Has this been discussed before?__Gamren (talk) 14:27, 12 September 2016 (UTC)

I bet it hasn't been discussed before. We can certainly create a language code for Middle Danish if no one objects; I'd suggest gmq-mda. —Aɴɢʀ (talk) 15:13, 12 September 2016 (UTC)
How big are the differences? —CodeCat 15:38, 12 September 2016 (UTC)
Oh we certainly can it's whether we should. Renard Migrant (talk) 16:07, 12 September 2016 (UTC)
You can see some samples at w:History of Danish#Medieval Danish. Maybe someone who knows Danish can tell us if that is as different from modern Danish as Chaucer is from modern English. —Aɴɢʀ (talk) 16:37, 12 September 2016 (UTC)
I can tell right away that the spelling is very distinct from what is used today, but I think a modern Danish speaker could figure that out, at least. However, what is described there is what I'd call Old Danish. The definitions on that page don't really sit well with me. What it calls Old Danish is what we'd just call Old Norse, and it was written in the same time as the Old Icelandic that many more are familiar with. w:Old Norse says: "The 12th-century Icelandic Gray Goose Laws state that Swedes, Norwegians, Icelanders and Danes spoke the same language, dǫnsk tunga ("Danish tongue"; speakers of Old East Norse would have said dansk tunga). Another term used, used especially commonly with reference to West Norse, was norrœnt mál ("Nordic speech")." So even the Icelanders said they spoke Danish, at the time. —CodeCat 16:45, 12 September 2016 (UTC)
Also consider the different definition given for w:Old Swedish. Those years are closer what I would expect for "Old Danish" as well. —CodeCat 16:46, 12 September 2016 (UTC)
I guess the next question is, how late are the words we're already calling Old Danish attested? If our Old Danish words are words/spellings attested up through the 15th century, then the reason we don't have Middle Danish is that what we're calling Old Danish developed directly into (early) Modern Danish. —Aɴɢʀ (talk) 16:57, 12 September 2016 (UTC)
There is a Middle Norwegian stage conventionally dated from 1350–1550, thus contemporary with Late Old Swedish. I think Late Old Swedish is sometimes called Middle Swedish (and Early Old Swedish consequently plain Old Swedish), but rarely, and same for Danish. However, Middle Icelandic is used for the same period. (In Faroese, it's the Old Faroese period.) --Florian Blaschke (talk) 03:14, 13 September 2016 (UTC)
Regarding chronology: Nudansk Ordbog and Den Danske Ordbog agree that (in approximate years, obviously): Old Danish lasted from 800-1100, Middle Danish 1100-1525 and Modern Danish 1525-present (DDO says 1500-present, but that's probably just a matter of precision). Regarding intelligibility: As a non-linguist speaker of Modern Danish, I cannot easily read Middle Danish, even if can recognize cognates once I know the translation. Compare: takær bondæ annær man mæth sin kunæ oc kumar swa at han dræpær anti mannen... with Tager en bonde en anden mand med sin kone, og sker det, at han ikke dræber manden... (see also Gammeldansk Ordbog, which places Middle Danish (gammeldansk) at 1100-1515, and furthermore separates it into older and younger periods, the division being at 1350). Regarding classification: I see that we have lots of references to Middle Danish, but they usually link to Danish entries (see eg. Storm, gilding, nettle). There is also at least one Danish lemma tryde, which I have no reason to believe exists in Modern Danish.__Gamren (talk) 16:58, 13 September 2016 (UTC)
800-1100 would conflict with the generally agreed definition of Old Norse, which was also spoken throughout that period. Essentially, if we adopt that definition, we'd have to say Proto-Norse split into Old Norse and Old Danish in the year 800, which is complete nonsense. —CodeCat 17:01, 13 September 2016 (UTC)
This is probably just a terminological question. Olddansk/runsvenska/Old East Norse was, as I understand it, one of two varieties of Old Norse, which we merge with Old West Norse (which is probably quite justified), and our Old Danish corresponds to gammeldansk, no? So the only question is whether Old Danish is the right word. The definitions I gave above correspond with our definitions (given by @Daniel Carrero, who may wish to say something) and the ones given in the WP article given above, but it is entirely possible this doesn't correspond to usage in Anglophone literature - I really wouldn't know! and I'm sorry if I made this a muddle.__Gamren (talk) 19:34, 13 September 2016 (UTC)


Birgit Müller (WMDE) 14:56, 12 September 2016 (UTC)

Transliteration nomenclature vote[edit]

I created this vote: Wiktionary:Votes/2016-09/Renaming transliteration. Please provide feedback on the talk page to help improve the vote as necessary. —CodeCat 16:08, 12 September 2016 (UTC)

If this vote passes, I assume we'll rename all pages in Category:Transliteration policies. I think this should be stated in the vote. --Daniel Carrero (talk) 16:16, 12 September 2016 (UTC)
Yes, it should. And the category itself will be renamed too of course. —CodeCat 16:18, 12 September 2016 (UTC)

WT:CFI should explicitly be for the main namespace[edit]

WT:CFI (under a heading 'scope' perhaps) should explicitly state that it refers only to the main namespace. In other words (as a specific example) *montania is not subject to the rules here. Renard Migrant (talk) 20:39, 12 September 2016 (UTC)

CFI currently states that some things go in appendices, and reconstructions go in the Reconstruction namespace. I think it's better this way. Logically, there are some criteria for inclusion in the Reconstruction namespace; if there were no criteria for inclusion, you could include anything there.
The policy says: "Terms in reconstructed languages such as Proto-Indo-European do not meet the criteria for inclusion. They may be entered in the Reconstruction namespace, and are referred to from etymology sections." I disagree with that wording. It's true that we often say: Proto-Indo-European doesn't meet CFI., but I think this is a problemantic statement. Proto-Indo-European does meet CFI, and the correct course of action is to place it in the Reconstruction namespace.
Relatedly, Reconstruction pages and some appendices follow closely the entry format so, in my opinion, both WT:EL and WT:NORM should explicity mention exactly to what extent they apply to these pages. Related discussion: Wiktionary talk:Normalization of entries#Proposal: encompassing reconstruction pages. --Daniel Carrero (talk) 21:21, 12 September 2016 (UTC)
Reconstructions shouldn't be subject to some criteria for inclusion, but not these ones. I think any reconstruction from a reliable source should be considered a valid entry title. 'Reliable source' of course can be subject to criteria that we can all discuss before implementing. Renard Migrant (talk) 21:27, 12 September 2016 (UTC)
Suppose we use your idea as an actual, formal rule: "any reconstruction from a reliable source should be considered a valid entry title." Where can we place the rule? WT:PROTO is a good candidate, but I don't like how it has a long encyclopedic explanation of what a reconstruction is, instead of a simple link to Wikipedia or to a help page. I prefer policy pages to contain only regulations when possible. If we can delete all this stuff, I would be glad to place (voted and approved) criteria for inclusion of reconstructions in WT:PROTO. I would also like WT:CFI to link to WT:PROTO if we do that. What do you think? --Daniel Carrero (talk) 21:41, 12 September 2016 (UTC)
I think WT:PROTO if anything isn't really a policy page at the moment. It feels more like a Wikipedia entry. It's well-written but we just don't need that much. It also doesn't really contain much actual policy. Renard Migrant (talk) 21:49, 12 September 2016 (UTC)
WT:PROTO said: "It must not be modified without a VOTE." But I did not find a vote that confirms this in the first place, so I demoted it to Think Thank. --Daniel Carrero (talk) 21:56, 12 September 2016 (UTC)
"Any reconstruction from a reliable source", without further cavets, sounds like a bad guideline for reconstruction inclusion. This would allow the inclusion of all sorts of transcription variants of the same reconstruction (which we currently generally standardize away, though allowing them as redirects). More controversially, this would also allow the inclusion of reconstruction variants — cases where all researchers agree that a proto-form is to be reconstructed as the source of data Y, but disagree on what its shape was. I would propose that such disagreements should be covered as discussion within a single entry. --Tropylium (talk) 22:13, 17 September 2016 (UTC)
If there weren't a hundred million votes already taking places there's a couple I'd like to propose. Renard Migrant (talk) 21:00, 12 September 2016 (UTC)
What would you like to propose? --Daniel Carrero (talk) 21:21, 12 September 2016 (UTC)
On my talk page, Dan Polansky and I discussed having single words de jure meet CFI. Sometimes like doglike doesn't actually meet CFI as it's written now. Of course nobody would actually delete it but it would be nice to have to rules cover what actually happens. Renard Migrant (talk) 21:27, 12 September 2016 (UTC)
Good idea. I'd probably support that. (as discussed in: User talk:Renard Migrant#CFI and idiomaticity clarification) --Daniel Carrero (talk) 21:44, 12 September 2016 (UTC)
@Renard Migrant, Dan Polansky: How many active votes do you think we should have on {{votes}}, before you feel it's OK to create the new vote for single words meeting WT:CFI? --Daniel Carrero (talk) 00:13, 13 September 2016 (UTC)
The way I see it: CFI was designed to apply only to the main namespace. Thus, it should be clear that the rules currently at WT:CFI only apply to the main namespace. Of course we need inclusion criteria for other namespaces, and these criteria may also be added to the page WT:CFI, but in a separate section from the current rules that only apply to the main namespace, or may be on its own page. --WikiTiki89 21:49, 12 September 2016 (UTC)

Stress marks and syllable marks[edit]

I've been working on putting syllable marks in lately, and I've noticed that the stress marks are interpreted as syllable marks when categorizing words by the number of syllables. When there are stress marks, do we need to put a syllable mark in front of the stress mark, e.g. should university be /ju.nɪ.ˈvɝ.sə.ti/ or /ju.nɪˈvɝ.sə.ti/? — justin(r)leung (t...) | c=› } 23:04, 12 September 2016 (UTC)

I was wondering the same thing. I asked about it to Metaknowledge in User talk:Metaknowledge#Dot together with the stress marker, and he replied there. --Daniel Carrero (talk) 23:09, 12 September 2016 (UTC)
@Daniel Carrero Thanks! Perhaps there should be something in the modules to prevent stress marks and syllable marks from being together. On a related note, should we be following the Maximal Onset Principle? — justin(r)leung (t...) | c=› } 23:48, 12 September 2016 (UTC)
I created Category:IPA for English using .ˈ or .ˌ and started populating it with any categories entries that seem to violate the rule that Metaknowledge described. If we really don't want a dot followed by a stress marker, then I believe the correct course of action would be fixing all entries in the category.
Concerning your question about the Maximal Onset Principle, if you directed it to me, I prefer if someone else more knowledgeable than me answered that instead. --Daniel Carrero (talk) 03:36, 13 September 2016 (UTC)
For English, I would follow the Maximal Onset Principle for stressed syllables first, and also make sure any stressed syllable with a lax vowel has at least one coda consonant. Once the stressed syllables are maximized, the unstressed ones will take care of themselves. In other words, happy should be syllabified /ˈhæp.i/, not /ˈhæ.pi/. That said, however, I do want to reiterate something I've said many times before: syllabification in English is far from obvious, and syllable boundaries are very often perceived to be located within consonants. Evidence suggests that the /p/ of happy is not exclusively in either syllable; rather it's simultaneously the coda of the first syllable and the onset of the second. But there's no convenient way to show that in IPA. For this reason, I personally am often very reluctant to mark syllable boundaries except in cases of vowel hiatus, where it's a convenient way of showing that a sequence of two vowels isn't a diphthong (e.g. Joey vs. Joy). —Aɴɢʀ (talk) 09:33, 13 September 2016 (UTC)
I don't have very strong feelings about putting the syllable boundary marker and the stress marker next to each other. Putting them both isn't wrong, but it certainly isn't necessary. —Aɴɢʀ (talk) 10:19, 13 September 2016 (UTC)
IPA is simply wanting a way to mark ambisyllabic consonants as found in West Germanic. We could add one as a house rule. /hæ‿p‿ɪ/ or something less ugly. Korn [kʰũːɘ̃n] (talk) 11:09, 13 September 2016 (UTC)
Another possibility would be listing both: "/ˈhæp.i/ or /ˈhæ.pi/". --Daniel Carrero (talk) 11:18, 13 September 2016 (UTC)
Definitely not that. That implies there are two possible syllabifications, and worse yet, that there's a way of distinguishing them. As for how to mark it, I think if we must mark it, then /ˈhæp.i/ is the least bad option. If we do invent a house notation, I'd rather use something that takes up less space, like /ˈhæpˇi/; we could define ˇ as meaning "the previous consonant is ambisyllabic". But if I'm honest, I'd really rather just stick to /ˈhæpi/, which is unambiugous, easy to read, and makes no theoretical claims as to syllabification. —Aɴɢʀ (talk) 12:22, 13 September 2016 (UTC)
Personally, I think /ˈhæ.pi/ is better than /ˈhæp.i/, because the latter looks to me like there is meant to be an audible break between the /p/ and the /i/. I agree that because of these problems, it's better to just have /ˈhæpi/. As for putting . before a stress mark, I think it's entirely unnecessary and thus oppose it. --WikiTiki89 13:46, 13 September 2016 (UTC)
I agree with Wikitiki. I think it would be better to omit the syllable marks entirely for English. Benwing2 (talk) 14:27, 13 September 2016 (UTC)
Purely from a user perspective, I'd prefer if a dictionary would have a house notation like /ˈhæṗɪ/, rather than omit information because of minor issues. Korn [kʰũːɘ̃n] (talk) 14:42, 13 September 2016 (UTC)

Not working for two-syllable words?[edit]

I noticed that using syllable markers in IPA transcriptions now adds words to categories indicating the number of syllables the words have, but only if the words have three or more syllables. Thus, /əˈfɹʌnt/ or even /ə.ˈfɹʌnt/ does not add affront to "Category:English 2-syllable words". Why? — SMUconlaw (talk) 12:20, 26 September 2016 (UTC)

Please read the description of Category:English 2-syllable words. --Daniel Carrero (talk) 19:26, 26 September 2016 (UTC)

What Needs to Happen[edit]

The main obstacle to resolving this dispute is that neither CodeCat nor Wyang trust the process- for good reason. In past disputes, we've had an unfortunate tendency to put out the immediate fires and then sweep the issue under the rug. Faced with this possibility, both have tried to get things the way they want them so that they don't lose out when everyone gets tired of the issue and moves on. The one thing we don't want to do is to jump in and take unilateral action- that will just confirm the worst fears of the one who loses out.

We need to resolve this now, before it becomes out of sight, out of mind. The way to do this is to get down to discussing what the new configuration should look like, in concrete terms.

Notice I said "discussing". We simply haven't gotten to the point of drafting votes, because we're still all talking past each other- any vote will most likely not address the issues needed to resolve the dispute and will just complicate things. The correct sequence is to come to a consensus, and then draft a vote, if necessary.

I can't do anymore at the moment because I'm still at work and it's really late. I'll spend some time on my way home trying to come up with a way to get the discussion started. Please don't blow things up in the meanwhile... Chuck Entz (talk) 02:25, 13 September 2016 (UTC)

I would support passing additional information (such as the name of the calling template and perhaps more) to the romanization module. This would make the Thai-specific code in Module:links that started this whole dispute unnecessary. I still think that there should only need to be one romanization module even if it provides both transliterations and transcriptions. --WikiTiki89 13:50, 13 September 2016 (UTC)
Another detail that hasn't been mentioned much is that Wyang wants to pass link target to the Thai module in order to find the transcription on the linked page. There are numerous reasons why this is a bad idea. Wyang has mentioned that the performance impact of reading the text of a page in a module is not as bad as people might assume at first, but that is not even the only issue. The romanization module must be able to romanize full unlinked sentences (such as in usage examples) and even redlinks. This cannot happen if the module depends on the existence of the link target. Not only that, but it would produce incorrect results for links with alt text, since it would transcribe the linked form and not the displayed form. --WikiTiki89 13:55, 13 September 2016 (UTC)
Is the reason for passing additional information such as the name of the calling template so that the Thai module can show a transliteration in etymologies and a transcription in translation sections? I'm opposed to doing that; I think it would be extremely confusing. Better to show both types of romanization in all places, as I've mentioned before. Allowing this would be a major user-facing change and needs a vote (that's why I had Dan create the vote). If this vote passes, then I think we should still require that transcriptions are always shown, and transliterations are also shown in the places where it's desired (e.g. etymology sections). Benwing2 (talk) 14:23, 13 September 2016 (UTC)
According to Wyang, some entries already do this. It should probably be reversed if there is no consensus for it. Though with how Wyang is, he'll put up a fuss and start another edit war. —CodeCat 14:27, 13 September 2016 (UTC)
I think that transcription and transliteration need to be separated on some level. First of all, one is conceptually an attribute of the script, and other of the language. Thus changes to a transcription of a script will have to be applied to all trans* modules separately making human errors likely. Second, transcription should be available to overriding while transliteration should always be automatically generated. Also, in historical languages using Abjads, it should be noted that having both of these would be useful, as one is a factual shape of the word as found in the text and other an educated guess and both are necessary to explain some etymologies.
Regarding the question of whether both or one romanization should be displayed, I suggest that, no matter what is decided to be the default option, appropriate html tags be placed around the transliteration so that a custom .css file can hide these for users that understand the script in question (seeing anything written in Cyrillic repeated in Latin can be slightly annoying when you already are native in the script).
Yet I do not understand the details of our current implementation and why Wyang's changes are creating problems. If his way of doing this is indeed too harmful I support reverting it, but then please draft an alternative solution to this. Crom daba (talk) 17:36, 13 September 2016 (UTC)
The alternative solution was Wikitiki's changes, which Wyang reverted over and over again and I reinstated over and over again. Contrary to what you might think, Wyang's changes actually did not establish separate transliteration and transcription. It merely bypassed the fact that the Thai transliteration module was called "translit" by putting the code that would have gone in there in Module:links instead. I argued that such code did not belong there, but it still remains there after months of bickering over it. —CodeCat 17:43, 13 September 2016 (UTC)
So what was the issue that Thai editors were complaining about? Crom daba (talk) 17:58, 13 September 2016 (UTC)
Wyang? He was complaining that transcription code should not go in a "transliteration" module, even though it's the normal practice on Wiktionary to do so. Because he didn't want to put the code where it belonged, he started messing with Module:links instead, and that's where I stepped in, and now we have this situation. —CodeCat 18:39, 13 September 2016 (UTC)
The whole point is: transcription and transliteration utilities should be separately maintained in the module system, whenever there is a foreseeable possibility that purpose-suited romanisation may be useful for the language. The argument is how to design a module structure, specifically a romanisation infrastructure, that best supports the features of these languages and therefore the wishes of the language-editing community. We are not proposing that language A should use X format of romanisation, or that Akkadian/Tibetan romanisations should be written as such, or that different modes of romanisation should be used in different locations (cf. link); these are all highly language-specific questions that need to be addressed separately and individually in discussions among knowledgeable editors. Our role here is to envisage the language-specific romanisation requirements that may be proposed, and partition our stored romanisation utilities in a way that is most regular and easiest to invoke, and in a way that does not deter editors in these languages from contributing in a way they consider most appropriate for the language.
The crux is “foreseeable possibility” of purpose-suited romanisation for a language. The reason purpose-suited romanisation is relevant is due to the different natures of the two modes of romanisation: transliteration is spelling-based, thus more etymology-oriented, and transcription pronunciation-based. The case of abjads is slightly different, but the benefit of storing utilities still applies. Why is purpose-suited romanisation and hence transliteration-transcription utility separation relevant on Wiktionary? Because:
  1. It is already being implemented in these languages ({{ko-etym-native}}). It is the consensus of the language community on how romanisations should be differentially applied. It is unreasonable to demand that the practice of using purpose-suited romanisation, which has been adopted universally in a language (you do not edit) for nearly ten years, be “reversed” without supplying any reason.
  2. Printed dictionaries do the same. The following are all the previewable Tibetan-English or English-Tibetan dictionaries on Google Books:
    Tibetan-English: 1, 2, 3
    English-Tibetan: 1, 2, 3.
All the Tibetan-English dictionaries use transliterations to romanise, and all the English-Tibetan ones use transcriptions to romanise. Why? Because different modes of romanisation are suited to different purposes – transliteration for etymology and transcription for translation from English.
  1. It conforms to the existing module infrastructure for these languages. In languages observing a transliterative-transcriptive contrast or languages where transliteration is intrinsically impossible, the transliteration-transcription distinction is strictly adhered to when the language-specific modules were designed. Where transliteration is impossible, the term “transliteration” is not ambiguated to mean “transcription”; we do not have Module:zh-translit and Module:ja-translit, instead we use Module:zh/Module:zh-pron and Module:ja/Module:ja-pron to handle transcriptions. Where the transliteration-transcription distinction makes a difference on a romanisation level, modules are named and maintained unambiguously; there are Module:bo-translit and Module:th-translit for transliteration, and Module:bo/Module:bo-pron and Module:th/Module:th-pron for transcription. It is the consensus of how romanisation utilities are maintained in these highly script-pronunciation discordant languages.
  2. It makes maintenance easier. Maintaining the transliteration and transcription modules separately makes whatever preference there is for the romanisation output less difficult to achieve. Seeing that abjads were raised before, if we decide to apply juxtaposed transliteration-transcription for all abjads or languages X, Y, Z, we can just add in some brief code in the links module to concatenate the outputs of transcription and transliteration modules of these languages (one can also be manually supplied), as these modules have already been recorded appropriately in language_data. If one day we would like to remove transcriptions in romanisations for languages X, Y, Z, we could simply remove the brief code added in earlier, without having to go through all the *-translit modules and delete the transcription passages, wondering whether they should be kept somewhere before they vanish.
  3. Using page parsing to achieve romanisation has no demonstrable harm. Transcription is inherently more difficult than transliteration; it is nearly perfectly automatable for certain languages (e.g. Korean) but most of the time it needs to be achieved using additional tricks, and page parsing is one of the tricks. I cited w:Wikipedia:Don't worry about performance before and I still think it is also very relevant for the technical structure on Wiktionary. The possibility of using page parsing has made us realise that it is perfectly possible to obtain both the transliteration and transcription for a word when they differ greatly, and this is very exciting. I think all the Thai editors would agree that the implementation of parsing since early this year has made their work much easier (Wiktionary:Statistics, sorted by change in #gloss definitions), and I doubt anyone would be in favour of removing this functionality and having to supply romanisations manually. Likewise for Chinese templates.
  4. Having an additional functionality module which does something useful is always beneficial. As long as it is maintained adequately. This could be said of transcription modules using parsing to obtain the romanisations. Even though it will not be able to grab a transcription from uncreated entries, or entries which have no pronunciation information, this is an indication that those entries need to be improved. In the case of Thai, having some automatic romanisation is better than having none and having to supply one manually. In the end, we aim to encompass all words in all languages and utilities have to be adapted to ensure we are at our highest efficiencies while progressing towards that goal. I'm sure the functionalities of this site won't be limited to what is present at the moment. If we want to build a Thai transliterator and a Thai transcriber to romanise a Thai passage (similar to what Google Translate is doing simultaneously to the translation), or if we want to develop a tool to romanise a Tibetan text in different ways, having an infrastructure in place which does not confuse the utilities will be essential.
Very few things are improved all of a sudden. While there is no transcription consideration in the central modules and the transcription modules are not recorded, it is most appropriate to name and maintain the romanisation utilities accurately. When the transcription modules can be recorded in language_data like the transliteration modules, the code should be migrated and rewritten. Above are my rationales for keeping the transcription and transliteration utilities separate for these languages where the different modes of romanisation are contrastive. Wyang (talk) 07:02, 14 September 2016 (UTC)

News from French Wiktionary[edit]

Hi all,

French Wiktionary is quite proud to publish every month a page with some fresh news about the project, Actualités. It is not targeting contributors but visitors and people interested into words. After 17 editions, we decided to translate our last edition of August into English, to make this publication available for you. It was quite a long job, so we are expecting your comments to know if it worth it, if we continue to translate our next editions or our previous editions too. Feel free to comments on any aspects of this publication, we are very open to improve it and our translation - as English is not my mother tongue. Thanks a lot to Andrew Sheedy (talkcontribs) and Pamputt (talkcontribs) for this translation! Noé (talk) 09:26, 13 September 2016 (UTC)

@Noé: Merci, mis amis (je sui americain, et no parle franc,ais...) Mis petites contributions. —Justin (koavf)TCM 13:54, 13 September 2016 (UTC)
@Koavf: In case you care, some corrections: mes amis, je suis, ne parle pas. --WikiTiki89 13:59, 13 September 2016 (UTC)
Je (ne) parle pas. UtherPendrogn (talk) 17:31, 13 September 2016 (UTC)
@Wikitiki89:, @UtherPendrogn: Merci! —Justin (koavf)TCM 22:49, 13 September 2016 (UTC)

Wikidata for Wiktionary: let’s get ready for lexicographical data![edit]

Hello all,

The Wikidata development team will start working on integrating lexicographical data in the knowledge base soon and we want to make sure we do this together with you.

Wikidata is a constantly evolving project and after four years of existence, we start with implementing support for Wiktionary editors and content, by allowing you to store and improve lexicographical data, in addition to the concepts already maintained by thousands of editors on Wikidata.

We have been working on this idea for almost three years and improving it with a lot of inputs from community members to understand Wiktionary processes.

Starting this project, we hope that the editors will be able to collaborate across Wiktionaries more easily. We expect to increase the number of editors and visibility of languages, and we want to provide the groundwork for new tools for editors.

Our development plan contains several phases in order to build the structure to include lexicographical data:

  • creating automatic interwiki links on Wiktionary,
  • creating new entity types for lexemes, senses, and forms on Wikidata,
  • providing data access to Wikidata from Wiktionary
  • improving the display of lexicographical information on Wikidata.

During the next months, we will do our best to provide you the technical structure to store lexicographical data on Wikidata and use it on Wiktionary. Don’t hesitate to discuss this within your local community, and give us feedback about your needs and the particularities of your languages.

Information about supporting lexicographical entities on Wikidata is available on this page. You can find an overview of the project, the detail of the development plan, answers to frequently asked questions, and a list of people ready to help us. If you want to have general discussions and questions about the project, please use the general talk page, as we won’t be able to follow all the talk pages on Wiktionaries.

Bests regards, Lea Lacroix (WMDE) (talk)

@Lea Lacroix (WMDE): Thanks to you and everyone at d: for working hard to try to integrate this project into Wikidata. —Justin (koavf)TCM 13:46, 13 September 2016 (UTC)

Open call for Project Grants[edit]

IEG barnstar 2.png

Greetings! The Project Grants program is accepting proposals from September 12 to October 11 to fund new tools, research, offline outreach (including editathon series, workshops, etc), online organizing (including contests), and other experiments that enhance the work of Wikimedia volunteers. Project Grants can support you and your team’s project development time in addition to project expenses such as materials, travel, and rental space.

Also accepting candidates to join the Project Grants Committee through October 1.

With thanks, I JethroBT (WMF) (talk) 14:49, 13 September 2016 (UTC)

Quotation questions (redux)[edit]

Last month, we had a discussion about quotations and what should be included and where. Several contradictory opinions were expressed. I'm willing to go make the changes to the quotations I added, but I don't think we quite reached consensus there on what to do. If this is not the right place to find consensus, please advise where I should take the questions at hand. Thanks! --Flex (talk) 17:04, 13 September 2016 (UTC)

I think yet another vote might be in order, alas. It's complicated for me at least because I don't necessarily dislike citations being shown together for different forms (perhaps per user setting), but I don't think they should be stored that way. See my comments in the discussion linked above. Equinox 20:14, 15 September 2016 (UTC)

RFDO discussion for Template:character info[edit]

I created an RFDO discussion for a high-use template. See: WT:RFDO#Template:character info. --Daniel Carrero (talk) 05:17, 14 September 2016 (UTC)

Deceased long-term user[edit]

Eclecticology, one of Wiktionary's first editors, has died; see [18]. This was announced over at w:en:WP:AN, the en:wp administrators' noticeboard. As a very infrequent visitor here, I don't know your procedures for the accounts of deceased editors, but someone should remove his account's bureaucrat rights, since Wiktionary:Votes/2015-11/Eclecticology for de-admin and de-bureaucratting concluded in favor of removing both those user rights, but somehow only the administrator right was removed. Nyttend (talk) 12:02, 14 September 2016 (UTC)

Thanks for notifying us. It appears that the account does not have any user rights at the moment. —Μετάknowledgediscuss/deeds 19:53, 14 September 2016 (UTC)
Should the accounts of deceased users be permanently blocked in order to prevent hacking? —Aɴɢʀ (talk) 21:05, 14 September 2016 (UTC)
It's been done, but I see no reason to once rights are removed (in fact, it can be quite an annoyance if the block notification turns up on all their userpages when those userpages are still useful to other editors). —Μετάknowledgediscuss/deeds 21:11, 14 September 2016 (UTC)
If they get a cross-wiki block, as far as I know it doesn't show up on their userpages. --WikiTiki89 21:17, 14 September 2016 (UTC)
Some projects block accounts of deceased editors, and others don't, while some projects do other stuff (en:wp protects their userpages and adds a deceased-user template), so I figured I'd just announce it and let you regular editors follow your procedures. Nyttend (talk) 21:57, 14 September 2016 (UTC)
Eclecticology was involved in the establishment of Wiktionary, and was Wiktionary's first bureaucrat. He also created this very forum, the Beer parlour. RIP. --Yair rand (talk) 22:38, 14 September 2016 (UTC)
In case anyone wants to see this. Here's also the first ever BP discussion. --WikiTiki89 22:55, 14 September 2016 (UTC)
Sorry to hear it. But long may he live on in the edit histories! My opinion about blocking is that yes, we should do it where it is confirmed that somebody has died, just for the sake of security. A disused account might somehow be exploited or hacked; a blocked one generally can't be. Equinox 18:21, 15 September 2016 (UTC)


I just noticed that WT:ATTEST doesn't say anywhere that a word has to be attested in the language of the entry. Oversight? --WikiTiki89 22:27, 14 September 2016 (UTC)

We must have read each other's mind because I was thinking the exact same thing. I think that should be added in. It's an assumption that none of us really sought to codify before, but you know what they say about making things idiot proof. —CodeCat 22:29, 14 September 2016 (UTC)
Probably not an oversight. Plenty of words can be attested in words other than the language of the entry. Also, thanks for calling me an idiot. UtherPendrogn (talk) 22:33, 14 September 2016 (UTC)
If the shoe fits, UtherPendrogn. —CodeCat 22:36, 14 September 2016 (UTC)
Sometimes reports written in other languages are the only evidence about the existence and meaning of words in languages that were not reduced to writing until close to or after the time of their extinction or at least the loss of some of their vocabulary. This happens fairly often for names of organisms. Sometimes early explorers', missionaries', et al reports of the organism and a genus name or specific epithet are all that remains. I would think that some words in those languages could be reconstructed from multiple reports written in the language(s) of the explorers, et al. DCDuring TALK 23:10, 14 September 2016 (UTC)
Well here I guess we're talking about uses, not mentions or reconstructions. What you describe would pretty much be a reconstruction or maybe a mention. --WikiTiki89 23:14, 14 September 2016 (UTC)
Nothing is fool-proof, as the saying goes. And I worry that attempting to close a (debatably-existent) loophole that there's been no serious effort to game (a single user misunderstanding the rules does not strike me as a serious i.e. potentially-successful effort to re-interpret them) could cause more harm than good. What would be the effect on words in various extinct languages that are attested only embedded in works in other languages (e.g. an Ancient Greek text includes the only known few Paeonian words, a Spanish-language book gives the only known Ciguayo word)? I hope we can just rely on the majority to be as intelligent as we've been being, in discerning when a text is saying "and que is a word in French" versus when it's saying "and these are some words" and one user is just erroneously arguing "some" is French in that snippet. - -sche (discuss) 05:14, 15 September 2016 (UTC)
I'd appreciate not being called unintelligent if possible. UtherPendrogn (talk) 05:16, 15 September 2016 (UTC)
The alleged repercussions seem like a feature to me, not a bug. This might be a bigger can of worms, but I suspect that languages attested entirely by mentions perhaps shouldn't qualify for regular mainspace inclusion — not necessarily in terms of being moved to an appendix altogether, but they perhaps should be given substantially different treatment (e.g. in terms of entry layout) from better-attestable ones. --Tropylium (talk)
I'm not sure what those differences would be. Our current approach seems to handle them fairly well, actually. —Μετάknowledgediscuss/deeds 22:49, 17 September 2016 (UTC)
What I originally meant was that uses must used in the langauge of the entry. For mentions, I don't think it matters what language mentions them, as long as it can be deduced what language is being mentioned. --WikiTiki89 22:55, 17 September 2016 (UTC)
I think the passage on use-mention distinction covers this, and it's not a loophole. Something like "Venezia isn't a word in French" wouldn't count towards an attestation of Venezia in any language because it's not being used. Renard Migrant (talk) 23:29, 17 September 2016 (UTC)
Except that we allow mentions for some poorly attested dead languages as mentioned above. What I'm trying to say is that "I went to Venezia" cannot count as an attestation of the Italian word "Venezia", because the sentence is in English, even though this is a use not a mention (it can, however, count as an attestion of "Venezia" for English). --WikiTiki89 23:36, 17 September 2016 (UTC)

2nd Definitions vote[edit]

I created Wiktionary:Votes/2016-09/Definitions — non-lemma to edit the next piece of WT:EL#Definitions.

This is basically a minor edit that converts two simple vote links into a single line of text. For this reason, I'm just creating the vote without prior discussion.

Let me know if this should be discussed further. If needed, we may postpone the vote. (which I find unlikely, but who knows) Feel free to edit the vote and change the wording. --Daniel Carrero (talk) 12:35, 15 September 2016 (UTC)

Actually, I expanded the voted text with a few bullet points. I believe these are already established rules to be documented. Hopefully, they shouldn't be controversial. --Daniel Carrero (talk) 14:12, 15 September 2016 (UTC)

bor vs. loan[edit]

I'm thinking of creating a bot myself to implement the results of Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor. The vote passed with 14-5-3 (73.68%-26.32%) +1 late oppose.

But, as usual with template naming votes, even though apparently the tendency is the short name winning, (I voted support and I have my own arguments to back it up) there are people who voted oppose, defending the readability of the longer names. {{bor}} is a 3-letter name, like {{inh}} and {{der}} -- but "bor" does not really mean anything. Would people prefer using {{loan}} on all pages instead? --Daniel Carrero (talk) 17:55, 16 September 2016 (UTC)

We've already voted on this. It's a done deal. --WikiTiki89 17:58, 16 September 2016 (UTC)
14-4-4. Donnanz struck out his vote but it's still counted in the numbering. But whatever, a pass is a pass. Wikitiki89's right let's not open up the issue again a minute after it's been voted on. Renard Migrant (talk) 23:49, 16 September 2016 (UTC)
Donnanz did not strike out their vote, just a statement which was part of the vote.
I'm happy with that response. I also prefer {{bor}}. I was just checking to make sure. --Daniel Carrero (talk) 00:08, 17 September 2016 (UTC)

Proposed addition to WT:NORM: the plain space (U+0020) and newline (U+000A) are the only allowed whitespace characters[edit]

Under this proposal, any other character that consists only of empty space, whether zero-width or with some width, is disallowed in the wikitext. This includes things like RTL and LTR markers, non-breaking spaces, halfwidth and fullwith spaces, and of course the plain old tab. This change, once implemented by a bot, should reduce the number of unwanted surprises with invisible characters. We could, perhaps, also introduce an edit filter that blocks any edits containing these characters, though we'd need to make an inventory of them first. —CodeCat 23:35, 16 September 2016 (UTC)

Symbol support vote.svg Support -- there's already a rule forbidding the tab, so it should be edited to disallow the others. --Daniel Carrero (talk) 23:38, 16 September 2016 (UTC)
What about HTML character entities? DTLHS (talk) 23:39, 16 September 2016 (UTC)
I think we can allow those, since they're visible to the editor. —CodeCat 23:42, 16 September 2016 (UTC)
Also, FWIW, what about a newline, which is considered to be a whitespace character? — justin(r)leung (t...) | c=› } 23:44, 16 September 2016 (UTC)
A good point. That one is allowed of course. Though I'm not aware of any character other than a newline that looks the same as a newline. —CodeCat 23:46, 16 September 2016 (UTC)
I want to keep fullwidth spaces for Japanese. Although, I will note that MediaWiki disallows the fullwidth space in page titles and automatically changes it to u0020. —suzukaze (tc) 00:02, 17 September 2016 (UTC)
Maybe we should allow different script-specific spaces in quotations and usage examples written in other scripts. Aside from the fullwidth space, are there other spaces like that? --Daniel Carrero (talk) 00:16, 17 September 2016 (UTC)
The whole point of the proposal was to eliminate invisible characters that people can't tell apart or reproduce. The average editor will expect that any empty space is a generic space. —CodeCat 00:29, 17 September 2016 (UTC)
We should still allow it, just only as an HTML entity or with a template. DTLHS (talk) 00:30, 17 September 2016 (UTC)
I would want to know more about the effect this would have on display of RTL scripts before supporting this. Lines with both RTL and LTR scripts can behave in very peculiar ways, and I don't want to make it worse. Chuck Entz (talk) 02:35, 17 September 2016 (UTC)
The LTR and RTL behaviour depends on control characters, mostly in Unicode category `Cf.' I believe this proposal mainly concerns whitespace, in category `Zf,' plus two pseudo-linebreaks in `Zp' and `Zl', and perhaps the control characters '\t' and '\r.' Isomorphyc (talk) 22:23, 19 September 2016 (UTC)
I intended it to include all nonprintable characters, though maybe I didn't make it clear enough. "[...] consists only of empty space, whether zero-width [...]". Control characters fit that description and indeed I would like to get rid of those, too, as they are invisible to the editor. Though of course, if it's not clear already, HTML entities for these characters are allowed by this proposal, so it's not as if we're banning them altogether, we're just banning them in their raw Unicode form because of the editing difficulties they cause.
As a side note, we have actual entries for control characters too, but they are all but inaccessible because of, predictably, technical issues. We should delete these entries. A dictionary shouldn't concern itself with encoding artefacts; you can't tell a space from a non-breaking space in a printed work, and there's no such thing as a control character in print either. —CodeCat 22:33, 19 September 2016 (UTC)
A cursory inspection confirms that we have no or very few RTL control characters, but a few thousand LTR characters which were probably superfluously copied in from various outside sources, for example in Patch and LoD. From a little bit of experimentation with Hebrew entries, I get the impression the LTR/RTL behaviour is handled below the wikitext level. My concern with HTML entities is that we don't want to degrade anyone's native wikitext typing experience by requiring native characters to be rendered either with HTML entities or inappropriate substitutes, as with CJK spaces. Since this depends on the wikitext rendering stack, I would have to do a little bit more experimentation to convince myself of this for a few other characters. But I'm we would agree about the result if this is roughly the principle you have in mind? Isomorphyc (talk) 23:20, 19 September 2016 (UTC)
  • Oppose as phrased, because I remember that I had to use nbsp in template switches since plain white space got not carried over from wikitext into the actual page output. Korn [kʰũːɘ̃n] (talk) 07:13, 17 September 2016 (UTC)
  • As I understand it, the proposal still allows the use of &nbsp; in the edit box, just not an actual nonbreaking space itself. —Aɴɢʀ (talk) 08:05, 17 September 2016 (UTC)
    • In templates, if you're having trouble with a space not being shown, you should use &#32; instead of &nbsp;. The former encodes an actual space (which is what you want), the latter encodes a non-breaking space. —CodeCat 12:36, 17 September 2016 (UTC)

Sounds okay in theory, but (per Chuck) we should probably investigate existing entries containing the "forbidden" chars and see whether we are overlooking any legitimate use cases. Equinox 10:37, 17 September 2016 (UTC)
If we put in an edit filter I would worry about people trying to save a page and not being able to determine why the system says they can't (even after we bot replace existing uses, people will always try to copy and paste from other sources which will inevitably include control characters). DTLHS (talk) 17:17, 17 September 2016 (UTC)
Yes, any edit filter should only tag, not block. Google Books, for example, uses RTL/LTR marks around author names a lot, so anyone trying to helpfully add citations would be blocked. But if a bot is going to make periodic cleanup runs, even a tagging edit filter seems unnecessary. - -sche (discuss) 18:06, 17 September 2016 (UTC)
I wonder if there's any way to automatically remove or replace certain characters when the page is saved. DTLHS (talk) 18:14, 17 September 2016 (UTC)
Since the software automatically replaces e.g. "a" + "combining grave" with "à"; presumably the devs could update it to automatically replace nonstandard whitespace with a regular space (but I don't know if they would). - -sche (discuss) 03:33, 18 September 2016 (UTC)
They replace "a" + combining grave, with "à" because they are defined by Unicode as equivalent (i.e. they mean the same thing). Non-standard whitespace is not defined by Unicode as equivalent to spaces. So the devs probably would not implement this special case just for us. --WikiTiki89 09:40, 19 September 2016 (UTC)
  • Symbol oppose vote.svg Oppose I use &nbsp;- and have not found a good way to dispense with it in Translingual Hypernyms and Hyponyms sections. It is also in every occurrence of all parameters of {{taxon}}, which is in virtual every taxonomic name entry. It is essential to allowing, say, a dash to follow text with a space without the dash appearing as an orphan on the following line. DCDuring TALK 03:05, 18 September 2016 (UTC)
    • &nbsp;- is an HTML entity, which we have discussed above. DTLHS (talk) 03:08, 18 September 2016 (UTC)
      I was reacting to the proposal expressed in the section title, not as it may be modified. DCDuring TALK 12:29, 19 September 2016 (UTC)
      I think the intention was in fact to allow &nbsp; when necessary in place of the literal character. --WikiTiki89 12:32, 19 September 2016 (UTC)
@CodeCat: This is good, with caveats: I'm only counting about 3500 pages affected, with about 7500 characters total (given 14 `bad' whitespace characters plus \t -- you might have a wider list, as technically '\t' is a control character, not whitespace.) About 3500 of these characters are simply &nbsp literals and can be replaced. Probably `em space' (0x2003) and `thin space' (0x1100) should be done away with-- the next two largest categories. I believe the CJK `ideographic space' (0x3000) should stay. The others are not very common. If anyone is interested here is a list (I omitted user pages, talk pages, etc.) :
  • hex,char,name,count
  • 0xa0, ,NO-BREAK SPACE,3454
  • 0x2003, ,EM SPACE,1136
  • 0x2009, ,THIN SPACE,1100
  • 0x200a, ,HAIR SPACE,687
  • 0x3000, ,IDEOGRAPHIC SPACE,609
  • 0x2008, ,PUNCTUATION SPACE,282
  • 0x2002, ,EN SPACE,165
  • 0x1680, ,OGHAM SPACE MARK,45
  • 0x202f, ,NARROW NO-BREAK SPACE,40
  • 0x2028,(),LINE SEPARATOR,39
  • 0x2005, ,FOUR-PER-EM SPACE,17
  • 0x2004, ,THREE-PER-EM SPACE,4
  • 0x2007, ,FIGURE SPACE,1
Isomorphyc (talk) 21:38, 19 September 2016 (UTC)
We just need to decide which ones we want to convert to a HTML entity, and which to replace with something else like a regular space. I think the em space, thin space and hair space and most other different-width spaces can become regular spaces. The Ogham space should stay, that's actually a printable character, it represents the line on which Ogham letters are written. Also, what about zero-width characters like the LTR and RTL markers, or zero-width non-breaking spaces? —CodeCat 21:46, 19 September 2016 (UTC)
We would have to test to see that they are there accidentally. If somebody typed it with a keyboard, it should stay in most cases, I think; but if it was pasted in from somewhere, has zero width, and has no effect on the presentation, that is a good sign it should go. For what it is worth, to make this more concrete, here is an equivalent list of control characters with their counts in Wiktionary: User:Isomorphyc/Sandbox/Control Characters in Wiktionary. By inspection, it seems inappropriate to remove all of them and worthwhile to remove some. Isomorphyc (talk) 00:02, 20 September 2016 (UTC)
I think there needs to be an exception in cases where the character is part of the normal encoding of a script. The only example I can think of is Persian, where the zero-width non-joiner is used in compound words and before the plural morpheme ها ‎() (for example: شب‌ها ‎(šab-hâ)). It would be unfortunate to have to encode this as {{m|fa|شب&zwnj;ها|tr=šab-hâ}}. --WikiTiki89 14:01, 20 September 2016 (UTC)

Proto-Brythonic verb lemmas[edit]

Should we reconstruct verb lemmas as absolute or conjunct 3rd person singulars, as in *ėɣɨd or *aɣ < *ageti? The former is more similar to the Proto-Celtic lemmas in form, but the latter more or less become the standard 3rd person singular in the daughter languages. Anglom (talk) 02:34, 18 September 2016 (UTC)

Or maybe the first person singular, since that's the usual lemma for Middle Welsh in academic material (notwithstanding the fact that we at Wiktionary use the verbal noun instead, a situation which I've been meaning to rectify but haven't gotten around to yet). I don't know what form is usually given as the lemma for the various stages of Breton and Cornish. —Aɴɢʀ (talk) 14:35, 18 September 2016 (UTC)
I thought about that, but the 3rd singular is usually the most commonly attested form in the earlier languages, it feels a little more justified to list them that way. Anglom (talk) 15:30, 18 September 2016 (UTC)
I favour the 3rd singular as well, though I'm not decided on absolute or conjunct. I think I'd prefer the form that descends from the Proto-Celtic lemma directly, but since we already don't do so for Old Irish, the point for Brythonic is moot. Since Proto-Brythonic and Old Irish are similar in terms of development, using the same form for them makes them easier to compare. —CodeCat 12:36, 19 September 2016 (UTC)

Separating transcription from transliteration[edit]

We seem to be at an impasse on this issue, with discussion having died out again. Here are a few ideas to start discussion with:

  1. Why don't we have a separate pronunciation parameter? Not only could this be used for transcriptions, it would also be useful for disambiguating homographs like wind. The main drawback is that it could be overused/stuffed with information best left to pronunciation sections.
    The reason I bring this up is that our current romanization method routes everything through the |tr= parameter. For languages that have both transcription and transliteration, that leaves no way to tell which is being displayed. Having a separate parameter also makes it easier to set it up as a parallel to our current treatment of transliteration.
    1. |pr= seems the most logical name for such a parameter
    2. How would we distinguish between the two? I think we should leave transliteration as it is, and use a superscript in front for the transcription: (Transcr:fonɛtɪk spɛliŋ) (with the superscript linked to something informative)
  2. Either way, I don't think we should have language-specific special code in Module:links if we can avoid it: it's currently the seventh-most-transcluded page on Wiktionary, used by 4,889,303 pages. More importantly, it's often used dozens of times on a single page and in a few cases thousands of times. Just on general principles, the part of Module:links that's always executed should be only for things that are general in nature and can't be handled in more specialized routines. Even if the overhead is minimal, the clutter makes it harder to maintain. I can understand temporarily putting in a short-term kludge until a solution can be integrated into the regular module structure, but kludges have a way of growing as more special cases arise. They also are harder to understand/maintain: I don't think it would be obvious to most people that local phonetic_extraction = {["th"] = "Module:th"} has anything to do with transcription, and I'm not sure someone wanting to make changes related to transcription would look for the code where it is now.
    1. I think the best approach to integrating transcription would be to have a separate value for transcription modules in the Module:languages data submodules to parallel "translit_module"
      1. I propose naming it "transcr_module"
      2. I propose naming the entry-point function in these modules "pr()" to parallel the translit modules' "tr()"
      3. It would then be a simple matter of adding parallel code to what we have in module:links for transliteration

I obviously like my proposals, but feel free to tweak, rework or replace any or all of it. The only thing I ask is that we arrive at something concrete, and not more theoretical or who-did-what-and-why-I-don't-like-it talk. Thanks! Chuck Entz (talk) 02:18, 19 September 2016 (UTC)

If we lack cooperation between our Lua module editor, we'll have the situation where transliterations and transcriptions are handled by separate modules for Japanese, Chinese, Thai, Burmese, Tibetan, etc and have no integration with other main modules. Wyang's templates (linked to appropriate modules) like {{th-l}}, {{ja-r}}, {{zh-usex}} exist almost in a separate world. I'd like to be able to transliterate Thai or Japanese by passing Thai phonetic respelling/hiragana with spacing, capitalisation,e tc but also use the features common to other templates. --Anatoli T. (обсудить/вклад) 02:37, 19 September 2016 (UTC)
As stated elsewhere, I am very much in favor of this, though for a different reason. Vahagn and I had discussed how many languages with abjads or other writing systems require both a transliteration and transcription (Hittite, Old Persian, Mycenaean Greek, etc.). This would greatly reduce the amount of |tr= overloading necessary to represent these languages. —JohnC5 02:46, 19 September 2016 (UTC)
|tr= may mean either transliteration or transcription or a mixture of both. For most languages, including abjad-based, the transcription-like transliteration has been the preferred one. That is also the case for Thai but displaying the character sequence (i.e. the "real" transliteration) can still be used for various purposes.--Anatoli T. (обсудить/вклад) 02:54, 19 September 2016 (UTC)
I support this. Wyang (talk) 06:03, 19 September 2016 (UTC)
Sounds good, only I'd prefer it if we didn't bind transcription to phonetics, because for some ancient languages it would be preferable to write for example: (Sogdian) ૛ૣી૒ીૡ૏ો૏ૐ ‎(pš'x'rycyk) (pašaxārēčik) without going into details of what exactly were 'a', 'ā', 'ē' or 'č'. Crom daba (talk) 08:33, 19 September 2016 (UTC)
What do you mean by "preferable". I want to know how to read/pronounce the word, so I want see "pašaxārēčik", as would be the case for Persian and other abjads. The actual string of characters can also be useful for etymologies or for people interested in learning the script.--Anatoli T. (обсудить/вклад) 08:39, 19 September 2016 (UTC)
Perhaps I wasn't clear. It is preferable to write "pašaxārēčik" rather than "pəʃɨxaret͡ʃjək" (don't quote me on this "reconstruction"). Obviously we need both transcription and transliteration (for one, because there still aren't any free fonts for Manichaean Unicode as far as I know). Crom daba (talk) 09:05, 19 September 2016 (UTC)

AWB access[edit]

Hello. I would like to get permission to use AWB on the English Wiktionary. I will use it to update Romanian adjective templates to a new format, since it's too tedious to do manually. I've never used it before, but from what I can tell, it doesn't seem too complicated. Thank you! Redboywild (talk) 09:59, 19 September 2016 (UTC)

You look like a good candidate for AWB but unfortunately I have no idea how to give you access. Anyone know? Benwing2 (talk) 05:05, 20 September 2016 (UTC)
Never mind. All you do is edit the list on the AWB page. Done. Benwing2 (talk) 05:08, 20 September 2016 (UTC)
Thanks a lot! Redboywild (talk) 08:15, 20 September 2016 (UTC)

Statistics to guide improvements[edit]

I've been experimenting with extracting data from a Wiktionary export (enwiktionary-20160901-pages-articles.xml). Along the way, I keep generating stats to help me get a feel for how the data is organized. Many of them seem like they would be of interest to Wiktionary staff and editors who have an eye to making improvements. So I thought I'd ask if that's correct.

Here is an example stat I generated last night. From the English set, in articles that have an =English= header and a =Noun= or other PoS header, here are all the distinct headword template names I found and their counts:

  • en-PP: 1
  • en-Proper noun: 7
  • en-abbr: 510
  • en-acronym: 67
  • en-adj: 97944
  • en-adjective: 64
  • en-adv: 16002
  • en-adverb: 18
  • en-comparative of: 1
  • en-con: 164
  • en-conj: 24
  • en-conj-simple: 36
  • en-conjunction: 13
  • en-cont: 375
  • en-contraction: 27
  • en-decades: 86
  • en-det: 72
  • en-initialism: 879
  • en-interj: 1346
  • en-interjection: 45
  • en-intj: 101
  • en-letter: 53
  • en-note-upper case letter plural with apostrophe: 2
  • en-noun: 207413
  • en-number: 39
  • en-part: 16
  • en-particle: 19
  • en-phrase: 106
  • en-plural noun: 1304
  • en-plural-noun: 6
  • en-prefix: 1012
  • en-prep: 373
  • en-prep phrase: 3
  • en-preposition: 21
  • en-pron: 315
  • en-pronoun: 65
  • en-prop: 329
  • en-proper noun: 23854
  • en-proper-noun: 176
  • en-propn: 11
  • en-punctuation mark: 2
  • en-suffix: 614
  • en-symbol: 52
  • en-usage-equal: 1
  • en-verb: 26654

As you can see, quite a bit of redundancy. And more than a few slated for deletion, like en-abr.

Some of the things I've found have motivated me to do some edits on articles with minor formatting errors. If there's interest, I'd be happy to supply more data like this. Thoughts? Jim Carnicelli (talk) 14:09, 19 September 2016 (UTC)

Are you sure about your regular expressions or other id. method? Just looking at one template, {{en-PP}}, which your listing says is used once, this special page reports that it is used on 137 pages, all but one of which is principal namespace.
I believe that the redundancy is principally attributable to redirects. eg, {{en-prop}}, {{en-proper-noun}}, {{en-propn}} are all redirects to {{en-proper noun}}.
How would we use these statistics for improvements? DCDuring TALK 14:51, 19 September 2016 (UTC)
Please bear in mind I'm new to this. I'm treading lightly because I know I'm surely missing an awful lot of context.
In generating the above list, I had already prefiltered based on the "ns" (I assume that's short for "name-space"), =English= header, and =<PoS>= header, with a finite list of the following parts of speech and pseudo-PoS: Determiner, Conjunction, Noun, Proper noun, Pronoun, Verb, Adverb, Adjective, Preposition, Interjection, Contraction, Prefix, Suffix, Affix, Particle, Numeral, Symbol, Initialism, Abbreviation, Acronym, Phrase, Prepositional phrase. So expect it to be a subset. Also, I'm using an entirely proprietary system. I'm not familiar with all the tools available within Wiktionary, so I don't expect I'll generate the same results.
Given that I've already found and corrected what I believe are minor mistakes in articles, like a missing =English= header in one case and an empty "=====" header-like line, I'm assuming there are many more such formatting errors. My goal is quite simply to help call them out in the event that others might be interested in studying and possibly correcting them. Just another set of eyes.
My personal interest in this has to do with being able to extract structured data like brief definitions and synonyms. I'm impressed so far to see that most of the term definition articles appear to follow a rigorous structure that is parsable. I'm presently focused on English single-word terms with an eye to computational linguistics tasks like part of speech tagging. I plan to make code I write freely available for creating condensed JSON-structured data. Thus far I've been able to transform Wiktionary's 4.7GB articles export into a 100MB JSON file with over 300k terms from the 4M+ articles. I'm struggling now with how best to parse out the head-word templates (e.g., "{{en-noun|-|adoxographies}}") and definition lines.
The above list is one trivial example I thought to include as an illustration. I just want to know that it's worth generating further lists. I don't want to take the time or trouble anyone if there's no interest. Jim Carnicelli (talk) 17:43, 19 September 2016 (UTC)
You may want to look into mwparserfromhell. It can simplify a lot of the work for you. —CodeCat 17:46, 19 September 2016 (UTC)
The headword template you mention has its code at Template:en-noun; unlike certain templates, this one is quite well documented. Templates change a lot (too often, really) so be prepared to revise your parser code very frequently. Equinox 17:47, 19 September 2016 (UTC)
Ah, thank you. I'll look into mwparserfromhell. Also, I have studied the Template:en-noun template. I appreciate how thoroughly it's documented. Some of the other head-word templates are a little less well documented. I was intrigued by finding several (e.g., Template:en-abbr) which are slated for deletion. I assume this means articles that use them still need cleanup. Jim Carnicelli (talk) 17:58, 19 September 2016 (UTC)
It was decided that "abbreviation", "initialism", etc. are not parts of speech, so we should use the appropriate PoS instead (e.g. TLC is a noun). Equinox 18:00, 19 September 2016 (UTC)

Proposal for bot redirects for numbers up to a million.[edit]

Per the outcome of various recent deletion discussion relating to numbers, I propose to bot-create about four million redirects which will point otherwise non-idiomatic numbers between 101 and to 1,000,000 to Appendix:English numerals#Naming rules (short scale). The reason that this will come to about four million redirects is that I propose to redirect from:

There are other possible variations:

Basically, I'd like to have a bot redirect all commonly used ways of making all possible non-idiomatic number combinations up to one million. However, in saying this out loud, it sounds pretty crazy. Is this a bad idea? I want people who look up numbers to be taken somewhere for their trouble. bd2412 T 15:22, 19 September 2016 (UTC)

Maybe we could edit {{didyoumean}} to cause any absent page, whose title is a number, to redirect to the appendix? Like Amazing (redlink) redirects to amazing. --Daniel Carrero (talk) 15:32, 19 September 2016 (UTC)
In general, I support your approach, but I would not overload the template with this. There are other really cryptic SoPs like (S01E01) to handle... --Giorgi Eufshi (talk) 06:55, 20 September 2016 (UTC)
Strong oppose. In fact many of the higher numbers will be unattestable. But more importantly, I don't see any reason why numbers are more special than other SOP combinations. --WikiTiki89 15:37, 19 September 2016 (UTC)
Re: attestability, try picking any number at random between 101 and 1,000,000 and do a Google Books search for it. You'll be amazed at how many random references you will find to "437,214 cubic yards of material" or 808,777 hogs having been infected with a disease, or an "increase in book value of ledger assets 279,361". I can virtually guarantee that every single number up to a million (and probably for a good way up from it) is attested in some ledger, census, valuation, report, or record. bd2412 T 15:55, 19 September 2016 (UTC)
I would argue that all numbers up from 10 are SOP. 4, for example, is defined as "The cardinal number four." but really it means "A digit used to form numbers, whose value is four × 10ⁿ, where n is the digit placement counted from the right. In 432, 4 means four hundred. (don't get me started on real numbers and non-decimal bases)" --Daniel Carrero (talk) 16:02, 19 September 2016 (UTC)
@BD2412: Regarding attestability, I was referring mainly to the spelled-out forms. --WikiTiki89 16:49, 19 September 2016 (UTC)
And just to give you an idea of how crazy this idea is, we currently have 440,889 English lemma entries, and you're proposing to create 4,000,000 redirects to one appendix page. --WikiTiki89 17:46, 19 September 2016 (UTC)
The practical drawbacks to this seem larger than the small-to-nonexistent benefits, to me. Who is going to fail to know what 347654 means, but (be able to input it, and) be helped by an appendix? Who is going to fail to know what "four hundred seventy-two thousand, five hundred fifteen" means, but think to look up that whole string rather than the parts? If all numbers are bluelinks, it will drown any effort to see if e.g. a certain number happens to have an entry (due to being idiomatic), a slight drawback, but compared to a slight-to-nonexistent benefit. Having every possible number in this range, including e.g. strings identical to phone numbers, be bluelinks (which, when, edited by someone after the bot, won't show up in a noticeable place like Special:NewPages) also seems like an invitation to easy-to-miss vandalism. And as Wikitiki says, why are these more deserving/needing of entries than other SOP but (or and, or or) "regularly formable" strings? - -sche (discuss) 16:42, 19 September 2016 (UTC)
  • Just on a nitpicky point, telephone numbers would not fall into the sweep of this proposal, unless 100-0000 is a phone number somewhere. It would, however, cover all the zip codes. bd2412 T 17:14, 19 September 2016 (UTC)
    Unless you leave the USA. - TheDaveRoss 17:24, 19 September 2016 (UTC)
    BD2412 cannot leave the USA (without first entering it). --WikiTiki89 18:14, 19 September 2016 (UTC)
    I am actually an American. I only ever leave the U.S. to go to Wikimania. ;-) bd2412 T 12:57, 20 September 2016 (UTC)
    Really? For some reason this whole time I've been thinking you were British... Now I'm wondering where I could have gotten such an idea. It must be that you sign half all your posts with "Cheers!" --WikiTiki89 14:08, 20 September 2016 (UTC)
I also generally oppose doing this via redirects, however if we did something to affect search results which had the same effect I think it might be of use. - TheDaveRoss 16:51, 19 September 2016 (UTC)
If the search results can be tweaked to this effect, that would be a fine solution. bd2412 T 17:12, 19 September 2016 (UTC)
There's another practical problem here. 415 is four hundred and fifteen in the UK and four hundred fifteen in the US and it can't simultaneously redirect to both. But in general, the proposal has no merit because it proposes making redirects for things that aren't words in any language. I have plenty more specific objections, but I think that one alone is enough. Renard Migrant (talk) 17:59, 19 September 2016 (UTC)
AFAICT the proposal is to redirect all of "415", "four hundred and fifteen", and "four hundred fifteen" to the same appendix, which is technically doable, but I tend to agree it's not desirable. - -sche (discuss) 18:23, 19 September 2016 (UTC)
Not to mention that 415 is not only a number in English, but also in practically every other language, so it doesn't make much sense to redirect it to the appendix page on English numerals. --WikiTiki89 18:58, 19 September 2016 (UTC)
Strongly oppose creating the "wordy" ones like three hundred and sixty-seven. Frankly I think the numeric ones would be pretty dumb too but that is more arguable. Equinox 18:32, 19 September 2016 (UTC)
  • A significant benefit would be that we could dramatically reduce the number of times a new contributor tries to add full entries for the terms. And we might be able to reduce the number of discussions of some of the inane matters relating to numbers that appear in some of our discussion pages.
  • Could we accomplish the goal of directing users to appendices by some other means?
As I understand it, we could accomplish the entry-prevention goal by protecting the pages for which we think we don't want entries or, perhaps, by an edit filter. DCDuring TALK 19:06, 19 September 2016 (UTC)
Oppose; just feels totally wrong, and will add tons and tons of unnecessary entries. I think we should have numbers up through 100, plus 200, 300, ... 900, plus powers of ten above that; partly I want these entries for translation purposes, since many languages have non-SOP ways of expressing them. (Plus any non-SOP numbers of course -- 101, 411, etc.) Benwing2 (talk) 04:58, 20 September 2016 (UTC)
I agree with respect to the numbers we should have. The question is what to do about numbers we shouldn't have, but which readers may for whatever reason either look for anyway, or try to create anyway. bd2412 T 13:01, 22 September 2016 (UTC)

Declension tables versus usage notes[edit]

I'm wondering how to treat a certain phænomenon. If certain grammatical forms replace other forms, or create new ones, should that be put into the declension table or the usage notes?
Examples: German subjunctive forms are now used as imperative forms, for phrases like "let's go". And most importantly for me: Low German optative forms replace, piece by piece, Low German preterite forms in the course of 400 years. So should I add the optative forms as alternative forms into the declension tables or make a note about this as usage notes? Korn [kʰũːɘ̃n] (talk) 12:16, 20 September 2016 (UTC)

If it's something that applies to all or most verbs across the board, then it shouldn't be in a usage note as the usage note would have to appear on every single verb entry. Maybe there could be a footnote within the inflection table itself saying something like "Increasingly used as the preterite" or whatever. —Aɴɢʀ (talk) 12:41, 20 September 2016 (UTC)
Sorry, yes, when I say usage note, I do mean one in the table. Cf. vri. Korn [kʰũːɘ̃n] (talk) 12:53, 20 September 2016 (UTC)
I think that's fine, especially for a historical language. For a modern language we might not want to list all obsolete forms in inflection tables. (Though TBH I do have a tendency to put obsolete inflected forms in Irish declension tables, so maybe I'm being hypocritical.) —Aɴɢʀ (talk) 15:12, 20 September 2016 (UTC)


@Angr, Chuck Entz, Anglom, JohnC5, CodeCat, Wikitiki89 Should we include some Proto-Nostratic words? If so, how would they be organised? We obviously can't put them as ancestors to PIE and Native American words without extensive proof~they're linked, of which there is little...? Some words could definitely be linked though, like PIE heu and Native American iw, both originating from a common ancestor (PN?). UtherPendrogn (talk) 20:10, 20 September 2016 (UTC) https://en.wiktionary.org/wiki/User:UtherPendrogn/k%CA%BCo An example of a word. UtherPendrogn (talk) 20:18, 20 September 2016 (UTC)

Nostratic is silly, founded on extremely poor data and poorer assumptions, and flies in the face of what rigour historical linguistics may claim. If there is sufficient reason to compare a form of unclear etymology with one in another language with no sure relationship, that is acceptable, but by no means should Nostratic "terms" be linked to or given serious consideration. —Μετάknowledgediscuss/deeds 20:19, 20 September 2016 (UTC)
Is there a better accepted ancestor to PIE? UtherPendrogn (talk) 20:22, 20 September 2016 (UTC)
Not really. Pre-PIE features are postulated based on internal reconstruction, but there's no higher node phylogenetically that has acceptance in academic linguistics. —Μετάknowledgediscuss/deeds 21:25, 20 September 2016 (UTC)
Does this mean I should stop making Sino-Caucasian entries? Crom daba (talk) 23:46, 20 September 2016 (UTC)
In my opinion, yes. Even if that's phylogenetically valid (which I doubt), it can't really be reconstructed to the standards expected by most historical linguists. —Μετάknowledgediscuss/deeds 00:23, 21 September 2016 (UTC)
What do you mean by "Native American iw"? Are you referring to Amerindian? Having worked a little with Uto-Aztecan and Yuman, I'm more than a little skeptical about that. There are former American Indian phyla such as Hokan and Penutian that have been mostly abandoned for lack of evidence (though there's evidence for some of the subdivisions)- the trend seems to be going away from unification rather than toward it (except for Dene-Yeniseian). As for Nostratic itself, everyone who believes in it seems to have a different combination of constituent families. Chuck Entz (talk) 03:36, 21 September 2016 (UTC)
Yeah, I've often wondered about Dene-Yeniseian. I had to read Vajda's paper in college and found it very convincing. Also, I believe there was recently a paper showing genetic evidence that the two peoples spent a significant period in the Bering Strait before splitting East and West. I remember that from a discussion with some professors I met from Diné College who also recalled a time when a Ket speaker came to the Navaho nation and discussed apparent cognate words in the two languages. But then again, all of this still remains too circumstantial. —JohnC5
I'm friends with an Athabaskanist who told me all the Athabaskanists she knows are pretty much convinced by Dene-Yeniseian. But it's definitely the exception rather than the rule for new suggestions of high-level groupings to be accepted by the wider linguistics community. —Aɴɢʀ (talk) 12:13, 21 September 2016 (UTC)
Do we have any Athabaskanists working on here? If any trusted Athabaskanist wanted to begin adding PDY forms, I'd be prepared to make a code for it and point PY and PND at it. —JohnC5 14:21, 21 September 2016 (UTC)
As I recall, some earlier BP discussions settled on basically the following rules of thumb for forms in Nostratic etc. macrolanguages:
  • the comparisons themselves can be mentioned in etymology appendices for PIE etc., if properly cited;
  • they cannot be created as their own reconstruction entries, with the special exception of Proto-Altaic;
  • they cannot be mentioned in mainspace entries.
I would support a compact appendix (or set of appendices) that listed the members of alleged Nostratic etymological groups together with reconstructions used by different authors, though (as said, no two groups of Nostraticists substantially agree on anything, so e.g. Illich-Svitych's Nostratic ≠ Dolgopolsky's Nostratic ≠ Bomhard's Nostratic). For that matter, I would even support proto-entries as soon as you can provide two unconnected sources (not e.g. from one scholar + one of his students) who can both agree on what the term's descendants are and what its reconstruction should be ;)
Re OP though, nobody considers Amerind to be "Nostratic". "Amerind" itself is a hypothetical macrofamily of a similar size as Nostratic; what you'd use to link them is "Borean" or perhaps "Proto-World" (the likes of which should probably be banned entirely from Wiktionary, being another order of magnitude more speculative than the likes of Nostratic or Amerind or Sino-Caucasian). --Tropylium (talk) 20:42, 22 September 2016 (UTC)
  • I recall reading that the emerging evidence from archaeology is painting a picture of multiple waves of migration from the Old World to the New over the span of thousands (tens of thousands?) of years, which would seem to make any such "Amerindian" family quite moot. ‑‑ Eiríkr Útlendi │Tala við mig 21:28, 22 September 2016 (UTC)

I plan to clean house in WT:RFD.[edit]

There are a number of months-old RFDs that have received little or no discussion. I'm giving fair warning that I plan to close all of these as no consensus in the next few days, unless an actual consensus develops quickly. Cheers! bd2412 T 13:04, 22 September 2016 (UTC)

The default in RFD is no objection, since the proposer themselves is generally in favour of the deletion. With no response, that's 100% in favour, therefore delete. —CodeCat 13:32, 22 September 2016 (UTC)
It's not that straightforward in the first half-dozen discussions. They have at least one half-hearted objection to deletion. What then? bd2412 T 13:43, 22 September 2016 (UTC)
Then it's no consensus. —CodeCat 14:16, 22 September 2016 (UTC)
Which causes the entry to be kept, I believe. --Daniel Carrero (talk) 14:19, 22 September 2016 (UTC)
That means an erroneous entry might be kept by virtue of sufficiently great user apathy towards the topic. If we turn it around, correct entries might go for the same reason. We don't have a better alternative, huh? Jury duty or something. Korn [kʰũːɘ̃n] (talk) 15:20, 22 September 2016 (UTC)
I've done my jury duty. --WikiTiki89 15:35, 22 September 2016 (UTC)
Inspiring. (Not sarcasm.) Maybe we can have a (collapsed or optional or something) list of RFDs/RFVs without any replies (= With only one signature.) in the watchlists? Like we have with the votes. Korn [kʰũːɘ̃n] (talk) 16:33, 22 September 2016 (UTC)
Not all my votes were to delete. I hope I didn't accidentally vote twice. DCDuring TALK 18:06, 22 September 2016 (UTC)
I don't there's a problem with erroneous entries being kept, as they can just be sent to RFV, where the default is to delete. And I think it's better to err on the side of keeping an SOP entry when in doubt, than to delete it just because not enough people care. Andrew Sheedy (talk) 21:41, 23 September 2016 (UTC)

User:Embryomystic form-of edits[edit]

Embryomystic has been fiddling around with form-of entries for a while now. Some of it is ok, but they've also replaced the perfectly-valid {{plural of}} with {{inflection of}} just for the sake of it. Now, they have their eyes set on Spanish and Portugese, and seem to replacing {{masculine plural of}} and similar generic templates with some language-specific templates that do the same thing. I objected to this but was ignored, so I'm bringing it to wider attention here. Generic templates should always be used if possible, and replacing them with custom templates for no reason is pointless. —CodeCat 19:44, 22 September 2016 (UTC)

You were not ignored, just disagreed with. I didn't create the Portuguese templates, but I find them useful, and I've been adding them to Portuguese adjective form entries that don't have them, and just recently I created parallel Spanish and Italian templates. I realise now that when I started doing something similar with Catalan that I was stepping on your toes, and I didn't object to you reverting Catalan entries, as you yourself had made similar templates for Catalan, but I don't really see why there's a problem with adjective forms being sorted into relevant subcategories as the Portuguese ones have been for some time now. embryomystic (talk) 19:50, 22 September 2016 (UTC)
Subcategorising non-lemma forms is mostly a pointless exercise that nobody benefits from, and therefore it's not worth the increased complication introduced by not using generic templates. —CodeCat 19:57, 22 September 2016 (UTC)
By this logic, should we delete Category:English adjective comparative forms? --Daniel Carrero (talk) 20:02, 22 September 2016 (UTC)
I wouldn't oppose it, unless someone can come up with a real use case. To me, this is no different from categorising Latin verb forms as "1st person forms", "singular forms", "indicative forms", "active forms" and so on. Categorising for the sake of it, not because anyone is ever going to have a use for it. Subcategorising lemmas is useful, but non-lemmas not really. —CodeCat 20:08, 22 September 2016 (UTC)
As someone whose browsing as a user trying to find words was more than once hindered by non-exhaustive categorisation, I'm leaning towards too much rather than too little. Korn [kʰũːɘ̃n] (talk) 20:36, 22 September 2016 (UTC)
  • Hear, hear. If there's no harm in having a category, I say keep it -- it's highly likely that someone somewhere has found it useful. ‑‑ Eiríkr Útlendi │Tala við mig 21:31, 22 September 2016 (UTC)
    • That's good and well, but then why create a language-specific template? If we all agree that such categories are useful, then surely they are useful regardless of language. Therefore, this functionality could be integrated into Module:form of and the language-specific templates done away with. I still oppose it, but if it's going to happen, it might as well be done right. —CodeCat 21:39, 22 September 2016 (UTC)
    • Also, as for it not being harmful, consider that one particular user created one category for every single form of Turkish nouns and verbs, a few years ago. It was a huge mess, resulted in dozens of useless categories. Since I'm assuming we don't want to repeat that, the question is how much is too little, how much is too much, and how much is just right. Personally, I think just about none at all is just right. —CodeCat 21:45, 22 September 2016 (UTC)
      Whatever the prefered, bear in mind that WT:ACCEL creates entries using the generic templates. --Q9ui5ckflash (talk) 13:24, 23 September 2016 (UTC)
      WT:ACCEL can be customized to use custom templates. But I agree, that if we want these, they should be language independent. That doesn't mean we need to use them for every language, but if it makes sense for Spanish, Portuguese, and French, then why not use a single template for all three languages? --WikiTiki89 13:33, 23 September 2016 (UTC)
      That certainly makes sense to me. embryomystic (talk) 23:30, 23 September 2016 (UTC)
      I don't think it makes sense for these languages either. —CodeCat 23:40, 23 September 2016 (UTC)

Let's get rid of the "Quotations" header[edit]

Wiktionary:Quotations says "Longer lists of quotations may find a more appropriate place in a separate section, as they would hamper readability for people only interested in the definitions." In this case, I think that the quotation really belongs on a separate citations page. The point of citations pages is to avoid cluttering up the entry with information that is not directly relevant to the words and definitions, but may still be useful for some readers (and for WT:CFI). So I propose abolishing this practice/header altogether, and moving its contents to the citations page. —CodeCat 21:37, 22 September 2016 (UTC)

I think I would support that. Equinox 21:39, 22 September 2016 (UTC)
I support removing the "Quotations" header, and adding {{seemoreCites}} in individual senses. This past vote might be relevant: Wiktionary:Votes/2016-02/Removing "Quotations". --Daniel Carrero (talk) 21:47, 22 September 2016 (UTC)
I also support. I don't think it is used terribly often as it is. - TheDaveRoss 21:56, 22 September 2016 (UTC)
Support wholeheartedly. The quotations sections are little more than clutter. Andrew Sheedy (talk) 21:57, 22 September 2016 (UTC)
Support. I've always found it weird that this header was even there, and it's annoying to see it on random entries. PseudoSkull (talk) 22:03, 22 September 2016 (UTC)
Oppose using citations page to hold citations that could easily go under the definition line. DTLHS (talk) 00:29, 23 September 2016 (UTC)
I always thought that the quotations used with definitions were just a selection of all the citations found on the citation page. That is, that one is a subset of the other. —CodeCat 01:09, 23 September 2016 (UTC)
I don't know what other people think citations pages are for. In my mind it is for quotations of as yet to be defined terms and for senses that are being researched, and the contents should be moved to the main entry if it is possible. DTLHS (talk) 01:15, 23 September 2016 (UTC)
I figured citation pages were just for collecting all the citations, the more the better? —CodeCat 01:18, 23 September 2016 (UTC)
Like I said, this might just be me. I would be interested to know what other editors think citation pages should be used for. DTLHS (talk) 01:19, 23 September 2016 (UTC)
My opinion is this:
  • The Citations: page should be used to collect an indefinite number of citations, the more the better. Getting citations from the internet is okay too, if the sense is already attestable through durably archived media such as Google Books.
  • Each sense should have only a small number of citations in the main page, which should preferably be representative and unambiguous, concerning that particular sense.
  • It would be nice if the citations in the entry were always a subset of the citations in the Citations: page, if the Citations: page is a big one. It is normal to add a new citation in an entry without copying it in the Citations: page, and I'm okay with that if there are only one or a few quotations.
  • The "Quotations" section in entries seems to be useless. If it is used simply to point to the Citations: page, the link could be added below each sense, when applicable. If it contains one or more quotations where it is unclear to what sense they belong, they can't be "representative and unambiguous" as suggested above and should be in the Citations: page until we figure out what to do with them.
  • Usage examples and quotations complement each other, so I oppose if people remove usexes just because the entry/sense has quotations.
--Daniel Carrero (talk) 01:34, 23 September 2016 (UTC)
I agree that usexes and quotations are complementary. I think Wiktionnaire does an exceptional job of illustrating definitions with a balance of both (relatively speaking, we are rather lacking in this area). Andrew Sheedy (talk) 01:38, 23 September 2016 (UTC)
I think of quotations in entries as just a special kind of usex: a usex that's attested and cited from another work. They're meant to illustrate the use of the word in that meaning, using an example from "out in the world" instead of something we made up. I don't think one should be favoured over the other, we should simply pick what works best in the particular situation. If none of the cites illustrate the use particularly well, a made-up example would do better. —CodeCat 17:54, 23 September 2016 (UTC)
I've always understood citations pages to be used the way CodeCat describes, in addition to hosting citations for senses that have yet to be added (or where the intended sense is unclear). I think they should eventually hold as many citations as is practical, to demonstrate as wide a range of use as possible (including various time periods, regions, registers, and genres). Andrew Sheedy (talk) 01:36, 23 September 2016 (UTC)
Quotations For what it's worth, that's how I think of Citations as well: that namespace has a large chronology of uses from which we pick a handful of particularly illustrative ones to show in the definition in the main namespace. —Justin (koavf)TCM 02:58, 23 September 2016 (UTC)
Question. The title of the thread is "Let's get rid of the 'Quotations' header", but could anyone explain for me exactly what the "Quotations header" is and where it appears? It seems from some comments that it is not the "[quotations ▼]" link that is seen next to some definitions, but if not that then what? Mihia (talk) 17:45, 23 September 2016 (UTC)
@Mihia: The entry abyss has a "Quotations" section. It contains the text:
It is a section, like "English", "Noun", "Etymology", "Pronunciation", etc. --Daniel Carrero (talk) 17:51, 23 September 2016 (UTC)
I see, thanks. My comment then would be that if the "Quotations" section was not there to provide a link to the "Citations" page, then probably many people would not notice that the Citations page existed. However, I don't know on what basis you would put actual quotations in that section rather than using the inline "[quotations ▼]" method. Mihia (talk) 19:27, 23 September 2016 (UTC)
Support. - -sche (discuss) 18:58, 23 September 2016 (UTC)
Support, and use the citations namespace for citations where meaning is not clear or is for a definition we don't have yet. Wherever possible, citations go under the sense they are supporting. Renard Migrant (talk) 16:52, 24 September 2016 (UTC)

Vote about not nesting headings inside stuff[edit]

Based on Wiktionary:Beer parlour/2016/August#Proposed addition to WT:NORM: headers cannot be nested inside things, I created Wiktionary:Votes/pl-2016-09/No headings nested inside templates or tags. --Daniel Carrero (talk) 04:01, 23 September 2016 (UTC)

IMO your vote is not well-phrased. What you want to disallow is something like this (i.e. where the header is surrounded by newlines):
Something like this: {{foo|==English==}} where there aren't any newlines isn't such a problem, and might conceivably actually occur. (In general, embedded newlines in templates cause lots of parsing problems, even using mwparserfromhell.) Benwing2 (talk) 05:22, 23 September 2016 (UTC)
@CodeCat, do you wish to comment here? This was her idea. In any event, I hope people don't start using {{foo|==English==}} without discussion, it seems weird and without precedent. Is there any possible use for this, even a hypothetical one? --Daniel Carrero (talk) 18:37, 23 September 2016 (UTC)
I recall we do it on talk pages. But this proposal is for entry space only (I don't say mainspace because entry space includes Reconstruction: too), so it doesn't matter. —CodeCat 18:54, 23 September 2016 (UTC)
Just in case, I added a note in the vote to remind people that the proposal only affects entries. --Daniel Carrero (talk) 18:57, 23 September 2016 (UTC)

Proposal for an extension to a few different page creation templates in the case of the search query containing more than one word[edit]

I propose a new addition to one search query template and one creation template. Now, honestly, I'm not really sure how someone would do this, but I'm sure that you LUA experts out there on this site probably could have some idea.

The rationale of both proposals is that if we do this, I believe the amount of (especially new) users creating SOP entries ignorantly of WT:CFI may decrease. I know that a lot of you may be thinking "Oh, well we already linked to CFI so I'm assuming that the creators of every entry are going to sit there and read that entire page to find the part about SOP (and fully understand it)." Let's face it; people don't read terms of service, etc., pages all that often, especially not fully. People are eager to go ahead and start creating entries. So we should at least include a little more right in front of their faces. And almost the entire WT:RFD page is dedicated to finding out whether or whether not a multiple-worded entry is SOP, so perhaps we really should include something that mentions SOPs in these two templates.

I can't find the actual templates themselves on here after I've searched, so fill me in on their titles please.

The extra texts will not appear if the query does not have 2 words or more. PseudoSkull (talk) 16:23, 24 September 2016 (UTC)

Proposed text 1[edit]

For large airplane:

"Wiktionary does not yet have an entry for large airplane.

  • You may Create this entry or add a request for it.
  • You can also look for pages within Wiktionary linking to this entry. This may help if, for example, large airplane is an inflected form of another word; although Wiktionary does not have the entry for large airplane, the base form may be listed as linking to this entry.
  • If you think this may be a misspelling, try browsing through our indices (e.g., the index of English words) for the correct spelling.
  • Perhaps there is a page large airplane in our sister encyclopedia project, Wikipedia.

Try searching Wiktionary:

  • If you have created this page in the past few minutes and it has not yet appeared, it may not be visible due to a delay in updating the database. Try refreshing the page, otherwise please wait and check again later before attempting to recreate the page.
  • If you created a page under this title previously, it may have been deleted. Check for large airplane in the deletion log. Alternately, check here.
  • Please also check large and airplane separately, as the definitions of those terms may collectively give you the meaning of large airplane."

For big strong girl:


  • Please also check big, strong, and girl separately, as the definitions of those terms may collectively give you the meaning of big strong girl."

For five-edged:


  • Please also check five and edged separately, as the definitions of those terms may collectively give you the meaning of five-edged." PseudoSkull (talk) 16:23, 24 September 2016 (UTC)


Proposed text 2[edit]

For large airplane: "Wiktionary does not yet have an entry for large airplane.

  • To start the entry, type in the box below and click "Save page". Your changes will be visible immediately.
  • If you are not sure how to format a new entry from scratch, you can use the preload templates to help you get started.
  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of large airplane does not equal the sum of the definitions of large and airplane. "

For big strong girl: "[...]

  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of big strong girl does not equal the sum of the definitions of big, strong, and girl. "

For five-edged: "[...]

  • If you are new to Wiktionary, please see Help:Starting a new page, or use the sandbox for experiments. Also make sure your entry meets our criteria for inclusion. Especially check to make sure that the definition of five-edged does not equal the sum of the definitions of five and edged. " PseudoSkull (talk) 16:23, 24 September 2016 (UTC)


General comments[edit]

General comments about both proposals as a whole should go here. I wanted to bring it up here before possibly starting votes, especially since someone might have better wording for the new additions than I did and might want to reword. PseudoSkull (talk) 16:23, 24 September 2016 (UTC)

Not seeing See also[edit]

Searching for "Gamergate" I typed gamergate into the Wiktionary search box. I clicked on the first entry and then scrolled down to the definition section. It did not refer to Gamergate, so I added a separate definition for that term. I was quickly reverted. Why? Because there is already a separate Gamergate page. I'm guessing my behavior is not uncommon for most casual users.

Has the project given any thought to either - (a) putting the "see also" information in the appropriate definition section (particularly helpful if there are multiple language definitions) rater than at the top of the page or (b) combining lower case and capitalized versions of words into one article? Butwhatdoiknow (talk) 13:17, 25 September 2016 (UTC)

Generally, I tend to assume that readers can see things that are at the very top of the page. That said, there is a solution to this, which I have implemented by adding a ===See also=== section pointing to the same link. By the way, I saw the definition you tried to add, and it was pretty clearly biased. That's not acceptable on Wiktionary regardless. —Μετάknowledgediscuss/deeds 16:57, 25 September 2016 (UTC)
Μετάknowledge - First, thank you kindly for making the change.
Second, I ask that you reconsider your assumption that many casual readers, focusing on the definition section, will not lose sight of everything else, including something at the beginning of the entry - particularly when they arrive at a page for the exact word they are looking up (or so they would assume, not noticing the capital/lower case difference). If you do so then I further request that you consider working to make it standard practice to do what you did for gamergate in all cases where there are separate capital/lower case pages.
Finally, I ask that you keep Hanlon's razor in mind when you consider whether a proposed definition is biased. In my case I tried in good faith to fit the opening paragraphs of the Wikipedia article into a single sentence. You evidently concluded that I failed in this attempt. But that is no reason to go immediately into chastisement mode. Butwhatdoiknow (talk) 00:35, 26 September 2016 (UTC)

Bot-replacing Template:etyl + Template:m with Template:der[edit]

In the past, I already proposed and then performed a bot run to do a replacement where {{etyl}} had "-" as the the second parameter. I'd now like to do the same, but more generally with all instances of {{etyl}}, replacing with either {{cog}} or {{der}} depending on the second parameter. This doesn't add or remove any information, as "der" adds to the same categories as "etyl". However, it does make things a lot easier for future editors who want to replace "der" with "bor" or "inh" as appropriate, because then it's a matter of changing the three letters of the template name. —CodeCat 13:23, 25 September 2016 (UTC)

Support. --Daniel Carrero (talk) 13:49, 25 September 2016 (UTC)
  • What will you do about things like From {{etyl|de|hu}} thieves' argot {{m|de|Fühbar}}.? —Μετάknowledgediscuss/deeds 16:51, 25 September 2016 (UTC)
    • Nothing. —CodeCat 17:23, 25 September 2016 (UTC)
      • I dunno, I've been using CAT:etyl cleanup as a way of finding terms for which a decision needs to be made whether they're inheritances or not. I've been working on the assumption that if an entry uses {{der}} it means someone has deliberately made the decision not to use {{inh}}, but that if an entry uses {{etyl}} it probably means no one stopped to think about the difference. But if your bot empties the Etyl cleanup categories automatically, then I'll have no way of knowing which entries have already been thought about and which haven't. —Aɴɢʀ (talk) 19:18, 25 September 2016 (UTC)
        • You can use the derivation categories. Granted, they won't be emptied out, but if you go through them systematically in alphabetical order, you'll eventually cover them all. —CodeCat 19:49, 25 September 2016 (UTC)
          • But the derivation categories include everything using {{der}}, regardless of whether a human editor deliberately used {{der}} instead of {{inh}} or a bot automatically used {{der}} without considering {{inh}}. The derivation categories will be far too big for me (or anyone else, probably) to feel any motivation to work through them. As a result, inheritances will stay in the derivation categories indefinitely, thus rendering completely useless the distinction we only fairly recently decided to make between inherited and noninherited terms. —Aɴɢʀ (talk) 20:22, 25 September 2016 (UTC)
            • I agree with Angr here. We should create a new template, perhaps {{autoder}} or {{ader}}, which does the same as {{der}} except use a different cleanup category. Benwing2 (talk) 21:22, 25 September 2016 (UTC)
  • Oppose this automatic change -- I also agree with Angr here. Any instance of {{etyl}} + {{m}} is pretty clearly the old format, and can thus be easily identified as an entry that needs conversion. Meanwhile, any instance of {{der}} is impossible to distinguish from an intentional use of {{der}}, and thus cannot be easily identified for any further processing.
FWIW, I often come across JA entries where we used the {{etyl}} + {{m}} templating in the past, because that's what we had, and now we need to use {{bor}} as the term is clearly a borrowing (such as スプーン ‎(supūn, spoon) which has already been converted, or タオル ‎(taoru, towel) which hasn't yet). ‑‑ Eiríkr Útlendi │Tala við mig 21:43, 25 September 2016 (UTC)
Support, done in the right way of course. No rush, better have to a separate {{etyl}} and {{m}} than a broken entry. Renard Migrant (talk) 17:51, 26 September 2016 (UTC)

"In other projects" in the sidebar"[edit]

Requested feedback Entries such as Wikipedia have "in other projects" in the sidebar and link to Wikipedia articles on a topic. For some reason, this entry only links Danish, Dutch, English, and German articles. Why? There are definitely articles on Wikipedia in other language editions of the encyclopedia. For that matter, there is material on Wikipedia on (e.g.) Commons. Why are these languages displayed? If the thinking is that these are all Germanic languages, then why not Scots (which is mutually intelligible)? Can someone explain this to me or direct me to policy discussion about it? —Justin (koavf)TCM 01:34, 26 September 2016 (UTC)

The {{wikipedia|lang=xx}} template is what puts them there. --WikiTiki89 14:48, 26 September 2016 (UTC)
@Wikitiki89: Excellent. Are there any best practices about this? E.g. should we have one for c: as well? —Justin (koavf)TCM 19:26, 26 September 2016 (UTC)