Wiktionary:Beer parlour/2014/March

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Languages with difficult scripts[edit]

Discussion moved from Wiktionary:Beer parlour/2014/February.

I was astonished today to run into the word šambaliltu for fenugreek in the Chicago Assyrian Dictionary, which I knew by the Persian word شنبلیله (our entry says it's transliterated shambalile, but I thought it was shambalila). I had no idea the word went all the way back to Akkadian. My first thought was: do we have it in Wiktionary? After looking at Category:Akkadian nouns, I still don't know.

Cuneiform is a complex script that very few people know how to read, and that most references don't use- but all our categories are arranged in character order with no transliteration shown. If you want to find a given word, you have to either: a) browse through all the entries, one by one; b) type the transliteration into the search box and hope it matches the one in the entry; or c) try to guess the characters in the name from the transliteration using some reference, and look for them in the category.

How can we improve on this? I know we can add a sort key to the category wikilink in each entry to make the categories list in transliteration order, and we can create transliteration-of entries (Category:Akkadian nouns has exactly one of those). It would be nice to have an automatically-generated index of transliterations for each category, and/or have the transliterations visible in the category listing itself. The second option looks like it would require assistance from the developers, but what about the first? Does anyone have other ideas? Chuck Entz (talk) 04:31, 1 March 2014 (UTC)[reply]

I support allowing romanisation entries for Akkadian (and any language that uses an obsolete script). — Ungoliant (falai) 04:36, 1 March 2014 (UTC)[reply]
@Chuck Entz Pro tip: It's March already. --WikiTiki89 04:51, 1 March 2014 (UTC)[reply]
Not here in California, but in wiki-land, I suppose it is. Topic moved. Chuck Entz (talk) 05:22, 1 March 2014 (UTC)[reply]
I support allowing romanizations as soft redirects, possibly with their own categories. --WikiTiki89 05:27, 1 March 2014 (UTC)[reply]
This is how Ogham is organised. viz: Category:Primitive Irish nouns --Catsidhe (verba, facta) 06:41, 1 March 2014 (UTC)[reply]
Romanized entries are already allowed for Etruscan, Gothic, Lydian, Oscan, Phoenician. Akkadian can be added to the list if a standard and referenced transliteration scheme is adopted and described somewhere. --Vahag (talk) 09:11, 1 March 2014 (UTC)[reply]
A more general solution would be some kind of fuzzy search to supplement (certainly initially) or even replace our main search capability. I am thinking of a search that could be restricted to search only within a given language, language family, or group of languages using a given script or script family.
I certainly favor any effort to expand the usefulness of our entries to those without great knowledge of the range of scripts in which they are entered. DCDuring TALK 13:40, 1 March 2014 (UTC)[reply]
One major difference between how Gothic and Primitive Irish are treated is that Gothic transliterations have no part of speech. Just "Romanization". I'm not sure why the same isn't done for PI. —CodeCat 02:06, 16 March 2014 (UTC)[reply]
On refreshing myself with Cuneiform, it's a hard problem. Akkadian cuneiform is a syllabary, but the normalised transcriptions (in Wiktionary, anyway) do not show the syllables which spell the word. (A·KA·AS·SA·PU·TU > Ákassaptu, as a completely made up example. More the point, a symbol may be read as Akkadian, Sumerian (which I think is shown by the differing transcriptions in parenthesis (Akkadian reading) and brackets [SUMERIAN]) or as a determiner. I don't think an automatic transcription is possible, much less fuzzy searching as described above. Moreover, I think that an Assyriologist needs to go through and normalise these, with a better description of what's going on. Something like
  • Akkadian
    𒀀 (mū) f. (plural)
    1. water
    Usage notes
    Most often complemented with MEŠ (link to whatever the cuneiform is for MEŠ)
Sumerian readings should be listed in the Translingual section. The Akkadian section would give variant transcriptions, possibly with a link to an appendix describing the process and conventions (such as transcribing in lower case for Akkadian readings, and UPPERCASE for Sumerian readings, which are not uncommonly combined within a word), and possibly even the transcription and the normalisation. The cuneiform would be a lemma, and the normalisation would also be a lemma, each linking to the other, but the diplomatic transcription, while given in both places, would not be a lemma. So...
TL;DR: It's a hard, and probably not automatable, problem to solve properly. --Catsidhe (verba, facta) 22:29, 1 March 2014 (UTC)[reply]

Request for having archaic third-person singular form for English verbs[edit]

I don't know if somebody has requested it before, but I think that we may want to include the third-person singular form of English verbs in the headline (goeth, cometh, hath). The details of implementation can be discussed later. What are your views? --kc_kennylau (talk) 16:41, 1 March 2014 (UTC)[reply]

I'd be against including them in the headword line, though of course they need to be listed somewhere in the lemma. —Aɴɢʀ (talk) 16:46, 1 March 2014 (UTC)[reply]
What do you mean by "somewhere in the lemma"? I think that the headword line is the best place to put it, IMHO. --kc_kennylau (talk) 16:51, 1 March 2014 (UTC)[reply]
I mean goeth should be linked to somewhere within go#English, but not the headword line. Under ===Conjugation=== would be a better place for it. —Aɴɢʀ (talk) 16:55, 1 March 2014 (UTC)[reply]
Such forms confirm the impression (also otherwise justified) that we are only interested in serving antiquarians and scholars rather than normal humans. I'd like us to do what we can to maintain the illusion that we care about normal people who might still be using Wiktionary.
I strongly oppose such forms being visible by default. It may belong in related terms or in some place, even the inflection line, where it is not visible by default but can be made visible. I would really like it if were not visible even if the user did not have JS or any other more than basic capability. DCDuring TALK 16:58, 1 March 2014 (UTC)[reply]
It would be too space-demanding if we have to have a new header for that, because the headword line already contains all the conjugations. If you do not wish to have an impression that you are "only interested in serving antiquarians and scholars", why don't you make the etymology section hidden? --kc_kennylau (talk) 17:02, 1 March 2014 (UTC)[reply]
I apologize if I'm sounding rude or anything, because my brain is now not functioning properly due to it being 1 o'clock in the morning for me, and due to the fact that English is not my native language. --kc_kennylau (talk) 17:03, 1 March 2014 (UTC)[reply]
The problem is that if we do this, we need to do it right, and there are a lot more forms we would need to include, such as the second person singular and the subjunctive. For example, at [[do]] we would need the indicative dost and doth, as well as the subjunctive doest and doeth, which is a total of four extra forms we'd need to add. --WikiTiki89 17:43, 1 March 2014 (UTC)[reply]
Please don't. It's bad enough when an entry starts off with a chain of rare, obsolete "alternative forms". Goodness knows what kind of Chaucerian gobbledygook our foreign users must be acquiring. Equinox 17:32, 1 March 2014 (UTC)[reply]
I would support moving the ===Alternative forms=== section down to where all the other related terms sections are. --WikiTiki89 17:43, 1 March 2014 (UTC)[reply]
I think alternative forms is placed where it is for reasons similar to why Wikipedia often lists several common varieties of the article name. It's there to let users know that they've found what they're looking for. —CodeCat 17:52, 1 March 2014 (UTC)[reply]
@KennyLau: I often hide portions of longer etymologies, especially lists of cognates. As our etymologies have gotten longer, I have become increasingly sympathetic to hiding them by default.
@Wikitiki & Equinox: Horizontal lists of alternative forms are less intrusive than vertical lists. CodeCat's point is true, but we have many rather obscure alternative forms that clutter the lists. I'm not sure what basis there would be for shortening the lists of alternative forms, but I've long thought that digraphs have low value. There may be other typographic alternatives that could be eliminated.
Perhaps all obsolete, archaic, and rare forms, whether in alternative forms or the inflection line could be made to appear only if a user chose to display them. DCDuring TALK 18:27, 1 March 2014 (UTC)[reply]
Jut to add my voice to the chorus: obsolete forms should not go in the headword line. We shouldn't clutter the line with information which is no longer useful and, by giving it prominence, suggest it is still usable* : I've known a couple of Germans who liked to use the 'goest'-forms of verbs because they couldn't stand that English wouldn't use as full a set of verb forms as German; they never realized (much like people who use ligatured spellings of "æqual", and much like the North Korean press office which famously only had outdated English-Korean translation dictionaries for a long time) how affectatious they sounded; we don't need to encourage more people to do the same. The last time the presence of obsolete forms on headword lines was discussed, people seemed to favour creating conjugation tables for English verbs which would include the -eth and -est forms. (And people seemed to oppose listing obsolete forms on headword lines, so I proceeded to move all the obsolete forms I could find in headword lines down to, for lack of a better place, ====Usage notes====.)
*I realize one could say much the same of obsolete alternative forms, which I nevertheless don't support moving out of the ===Alternative forms=== section, despite that section's current prominent placement. I think the solution there is to collapse the obsolete forms under a rel-top, and perhaps move the entire Alt forms section — whereas, collapsing part of a headword line, or moving the headword line to a different part of the entry, would be a bad idea. Also, I think that obsolete forms belong with other alt forms, and can (when in Alt forms sections) be segregated onto a separate line and labelled clearly, whereas obsolete forms listed as part of a single run-on headword line can't be labelled as clearly without making the headword line into two or more lines.
- -sche (discuss) 20:43, 1 March 2014 (UTC)[reply]
I wonder if putting alternative forms in a right-floating box would help. It would keep them "out of the way" while still being at the top so they can be seen easily. —CodeCat 20:52, 1 March 2014 (UTC)[reply]
It wouldn't help if one were using rhs table of contents.
The discreet way in which quotations are concealed by default would be great for obsolete etc forms, still allowing common alternative forms to be prominently displayed. That same approach could allow the archaic inflected verb forms to be displayed just beneath the modern inflected forms.
I am OK if the archaic forms cannot go unto the headword line, but they have to be included at least somewhere in the entry, since one currently can find not the forms anywhere in the main entry, which becometh a problem when thou beest trying to conjugate an irregular verb. --kc_kennylau (talk) 13:13, 2 March 2014 (UTC)[reply]
Discussion last year showed that there was interest in having conjugation tables for English verbs, and this discussion shows continued interest, so I've deployed a conjugation table on walk and talk (adapted from something CodeCat designed in the July 2013 discussion—I made some changes to which forms were displayed). See what you think. Obviously, we'll need another table to account for verbs that add, delete or change letters when inflecting, e.g. hate, but that shouldn't be too hard, and per our usual practice, I expect that all the tables used in entries will end up being shells that use a single backend. I think be is so much more complex than every other verb (with two whole conjugations, and many forms other verbs don't distinguish) that it may need a separate template, though. - -sche (discuss) 08:51, 28 March 2014 (UTC)[reply]
  • Hm, that would also work. It will be a big table (two tables, really: one suppletive and one not suppletive, the latter now used [in the indicative] only for one sense of be), so it would make the source code of [[be]] more legible if it transcluded a table ({{en-conj-be}}) than if it were filled with screens and screens of template code. But I don't actually object to coding the table into the entry. - -sche (discuss) 18:23, 28 March 2014 (UTC)[reply]
One-off templates are often used to reduce clutter on pages. There is no reason that we shouldn't do it. --WikiTiki89 18:49, 28 March 2014 (UTC)[reply]
Would User:CodeCat/en-conj-table be good as a starting point? I still had it from another discussion some time ago. —CodeCat 19:11, 28 March 2014 (UTC)[reply]
Your template has some mistakes. I have created User:Wikitiki89/en-conj-table which both fixes the mistakes and changes the layout to what I think makes more sense. --WikiTiki89 19:25, 28 March 2014 (UTC)[reply]
  • @Wikitiki89 I was under the impression that transcluding templates increases server load and lengthens page loading times. It's perhaps a minor consideration, but it is one potential reason not to create and use templates that are only ever used in one place. ‑‑ Eiríkr Útlendi │ Tala við mig 19:14, 28 March 2014 (UTC)[reply]
    • Given the number of modules and templates we already transclude on most pages, one more is hardly going to make a difference. I agree with one-off templates because they make things more manageable in the long term. It makes it possible to track down entries by their templates, for example. Raw tables in entries have caused me quite some trouble in the past. —CodeCat 19:20, 28 March 2014 (UTC)[reply]
    • If they are ever only used in one place then their impact on server load is very insignificant. --WikiTiki89 19:25, 28 March 2014 (UTC)[reply]
Using one-off templates make it much harder to edit a page, because of seeing what's there and then editing it, you've got to figure out where it's being transcluded from. Why shouldn't someone be able to edit the Conjugation section of be by hitting the edit line on that section header?--Prosfilaes (talk) 18:55, 28 May 2014 (UTC)[reply]

This page is very important as far as policy/common practice pages go, but it's also rather out of date. We should bring it up to date, but I'm not sure what the current state of affairs is on everything. I know that there's a general consensus that "acronym", "initialism" and such are no longer acceptable. We've also deprecated "cardinal number/numeral" and "ordinal number/numeral" recently in favour of plain "numeral" (for cardinals that don't fit any other part of speech) as well as "adjective" for ordinals. But there's also "idiom" which I don't think should be allowed either, although I don't know if any consensus exists on that. "Participle" seems to be gaining some ground recently; all Dutch participle entries use it now.

We should probably also go through the "Other headers in use" list and see which ones we should try to track down and fix. —CodeCat 21:09, 1 March 2014 (UTC)[reply]

Indeed. If someone more technically adept than me could generate a list of all the POS headers that are currently in use (and, if it wouldn't be too taxing, a count of how often each header is used), that would let us know where we stand, as far as which headers are in use vs which are prescribed by WT:POS. One way to generate a list of POS headers would be to generate a list of L3 headers and then just manually remove non-POS stuff like "See also". In fact, a list of all L1, L3, L4, L5, etc headers (but not L2 headers — WT:STATS already has a list of those) which are in use, with information on what level they are and how often they occur, would surely reveal some typos and other things that we'd like to change. There shouldn't be any L1 headers. And I often notice "References" at L4 not inside a numbered etymology section. But I'm getting off-topic; I apologise. - -sche (discuss) 22:04, 1 March 2014 (UTC)[reply]
User:DTLHS/headers DTLHS (talk) 23:33, 1 March 2014 (UTC)[reply]
Thank you for making the list, but in its current format it's not terribly useful. We really need to know which entries the non-standard headers occur on, so they can be fixed. At least the spelling mistakes. —CodeCat 23:52, 1 March 2014 (UTC)[reply]
I would need a list of standard headers, to avoid listing millions of Noun entries. Also you can use the search to find the less common ones. DTLHS (talk) 23:55, 1 March 2014 (UTC)[reply]
Also I just noticed one header called "Δεψλενσιον". It looks like someone had their keyboard in the wrong language! XD —CodeCat 23:54, 1 March 2014 (UTC)[reply]
Thank you! There are about as many spelling errors as I expected, lol. - -sche (discuss) 00:05, 2 March 2014 (UTC)[reply]
Σηοθλδ ςε σορτ τηε Οτηερ ηεαδερσ ιν θσε ιντο τηοσε ςηιψη αρε λανγθαγε σπεψιφιψ ανδ τηοσε ςηιψη αρε νοτ? Ανδ σηοθλδ ςε θσε Νθμβερ ορ Νθμεραλ? Ανδ Ι αμ αςαρε τηατ ςε αλσο σορτ Αββρεωιατιονσ, Αψρονυμσ, Ψοντραψτιονσ, Ινιτιαλισμ ανδ Συμβολ ιφ Ι αμ ψορρεψτ. --kc_kennylau (talk) 01:33, 2 March 2014 (UTC)[reply]
That is incredibly hard to read, so I'll transcribe it for everyone: "Should we sort the Other headers in use into those which are language specific and those which are not? And should we use Number or Numeral? And I am aware that we also sort Abbreviations, Acronyms, Contractions, Initialism and Symbol if I am correct." --WikiTiki89 02:01, 2 March 2014 (UTC)[reply]
@DTLHS: I've taken a stab at sorting your list into headers which are "standard enough" (including headers which are not standard anymore but which were once standard and which are therefore still common) and headers which IMO need to be tracked down (including both nonstandard/typo headers, and headers which may be standard but are so rarely used at the level they're at that I thought they could use review). You may want to wait for others to comment / modify the list further before generating a list of where all the nonstandard headers are used, though. - -sche (discuss) 02:16, 2 March 2014 (UTC)[reply]
I've added some notes as well. —CodeCat 02:23, 2 March 2014 (UTC)[reply]
Thanks for you input, I'll generate something more comprehensive in a day or so. DTLHS (talk) 02:53, 2 March 2014 (UTC)[reply]
I've edited the page some more, grouping headers by where they may appear. —CodeCat 13:42, 2 March 2014 (UTC)[reply]
It looks like the list is finished, but there are a lot of problems already, that I can see. —CodeCat 00:02, 3 March 2014 (UTC)[reply]
I've used MewBot to correct all the spelling mistakes, and converted some of the deprecated ones to the modern versions as far as was feasible with a bot. User:-sche also helped out. We probably need to wait now until the next dump is made, as the list doesn't reflect the current reality anymore. —CodeCat 16:43, 4 March 2014 (UTC)[reply]

Wiktionary would benefit from a more user-friendly discussion forum[edit]

I have not been editing here regularly for years, and so coming back and looking at things with fresh eyes I think that there is a need for a more modern discussion forum, with a more normal way of posting messages. I think that would help discussions to take place more easily and would allow new users and people who don't edit regularly to contribute to discussions without being familiar with the unusual way of posting a message (when compared with other online forums/'fora'). It might be easier to edit from a tablet, as well. Also, the current format is slow to edit for anyone with a very slow internet connection. Does anyone agree? Has this been raised before? Kaixinguo (talk) 12:27, 2 March 2014 (UTC)[reply]

  • I have an infinitesimal amout of hope that Flow will be this. However, its developers seemingly prefer to waste time on trying to make it look like linkbait for a certain website with an orange logo, instead focusing on features like threads on multiple pages.
Also, we have LiquidThreads. Boring, somewhat clunky, somewhat ugly, but mostly functional. I am not sure why we do not use it more widely. Keφr 13:42, 2 March 2014 (UTC)[reply]

What would a lua-cized translation template look like?[edit]

I'm trying to continue the discussion here. --kc_kennylau (talk) 13:26, 2 March 2014 (UTC)[reply]

What would you like to discuss? DTLHS (talk) 23:29, 3 March 2014 (UTC)[reply]

Vote: CFI: Removing usage in a well-known work 2[edit]

FYI: Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 2

Let us postpone the vote as much as the discussion will make necessary. --Dan Polansky (talk)

Part-of-speech sections with multiple headword lines, lemmas with form-of definitions on the same page[edit]

Occasionally I come across entries where there is a single part-of-speech header with multiple headword lines below it. An example is mensa, in the Latin section. There are also many examples of this in Italian entries. Sometimes they are like the case of mensa where the second headword line is simply a form of the first, but I've also seen it used to distinguish masculine and feminine-gendered nouns with the same lemma form. I'm wondering what the general consensus is about entries like this. I personally think that this is wrong, and that these really should have their own headers. Masculine and feminine nouns are separate, they are not the same lemma, because they have (at least in Italian) separate types of inflections. But even the Latin mēnsā is not the same word as mēnsa; they have different pronunciations and it's only because of an orthographic shortcoming that they end up on the same page. There's also a practical consideration: having "floating headwords" with no header before them is harder for bots to parse, especially when they are formatted using the obsolete "bold headword" formatting like mēnsā is. As the page is now, a bot that tries to parse that page will come upon what seems like a random bit of bold text in the middle of the list of definitions. —CodeCat 23:26, 3 March 2014 (UTC)[reply]

Mensa should definitely not be formatted like it is, we should either remove the "mēnsā f" line (and perhaps also the following "# ablative singular of mēnsa" which simply points users to the page they're on) or give it its own ===Noun=== header. - -sche (discuss) 00:12, 4 March 2014 (UTC)[reply]
I don't think we should remove it, because it's pronounced differently from the noun above. There should be two pronunciation sections on the page, which implies two noun sections IMO. —CodeCat 15:32, 4 March 2014 (UTC)[reply]

A question that's somewhat related is about form-of definitions that are homonymous with the lemma and therefore appear on the same page, under the same etymology/pronunciation section (often under the same PoS header too and listed as one of the definitions of the lemma, but not always). mēnsā is not an example, because its pronunciation differs, but the entry also contains a definition for the vocative singular. Again I'm not really sure if this is good practice. The main reason we create form-of entries in my opinion is to let people find the lemma, and also to give pronunciation information about each form specifically. But when the lemma is on the same entry already, and has the same pronunciation, there's not really anything to be gained from listing the form-of definition there. There would presumably be an inflection table on the entry, and that would show which forms coincide with the lemma form (these often appear in black bold, too). Please note that this doesn't apply at all to sublemma form entries. These entries would need extra grammatical information, such as their own inflection table. In those cases we should create sections for both, like on vergeten, which is both a verb (infinitive lemma) and the past participle of that verb. —CodeCat 23:26, 3 March 2014 (UTC)[reply]

Hot words[edit]

Lately there has been a lot of debate about words such as olinguito and Euromaidan, which don't pass CFI but likely will in the future. I think that instead of keeping them outside of CFI, we should amend CFI to accommodate such words.

Proposal for special provisions for "hot words":

  • If a word that can be considered beneficial to Wiktionary has citations that do not span the required time period, but meet the following criteria, the word may be kept as a "hot word". The citations must be:
    • relatively recent.
    • reach a wide population.
    • from a wide variety of media.
  • An entry for a "hot word" will have a highly visible indication of its special status.
  • While a "hot word" meets the above criteria, it will have the same rights as any other entry and may even be featured on the main page.
  • A "hot word" will be reevaluated every time its lifespan doubles and if it does not still meet the above criteria, it must be deleted.

--WikiTiki89 04:51, 6 March 2014 (UTC)[reply]

I agree with this, but maybe we shouldn't call it a "word" unless we intend it to apply only to single words. I'm not sure if adding all of it to WT:CFI is a good idea, that page is very long and hard to read as it is. Though that's a separate discussion. —CodeCat 14:15, 6 March 2014 (UTC)[reply]
I also tend to agree with this. It often happens that a relatively new term makes the headlines and, naturally, people will try to look it up in online dictionaries. We should do what we can to help in such circumstances. Perhaps such terms should be in a hidden category that we can check from time to time to see if it is still being used. SemperBlotto (talk) 14:40, 6 March 2014 (UTC)[reply]
I think we need this. A categorizing template that contains a date of the entry or of the first valid citation would facilitate review a year or 13 months after the date leading to conversion of a normal entry or demotion to protologism status. We have a large Appendix:List of protologisms useful for reminding us of how bad some suggestions can be. DCDuring TALK 15:04, 6 March 2014 (UTC)[reply]
I created {{hot word}}, which can be added to a page with a date= parameter. It checks whether the date was more than a year ago, and categorises the page accordingly. —CodeCat 15:35, 6 March 2014 (UTC)[reply]
I added {{hot word|date=06 March 2014}} to the current sandbox - no obvious result. SemperBlotto (talk) 15:41, 6 March 2014 (UTC)[reply]
See [[olinguito]] for a working test. DCDuring TALK 15:55, 6 March 2014 (UTC)[reply]
We don't need a vote to try this for a while, do we? For how many terms would this apply, help? DCDuring TALK 15:58, 6 March 2014 (UTC)[reply]
Basic template defaults should allow the selection of the date of creation of the entry as a default without jeopardizing our amateur status. Riskier in that regard would be:
  1. making sure that the earlier of the creation date or the date specifically inserted as a template parameter was selected.
  2. finding (using Lua?) the earlier citation date in the applicable language section and use that.
  3. allowing for protologistic senses, not just L2s. DCDuring TALK 16:07, 6 March 2014 (UTC)[reply]
Including “hot” words is a good idea, but I think it would be better if we added them to the Appendix namespace and linked to them with a template similar to {{only in}}. This will allow us to keep entries even if they turn out to be fads, if we forget to reevaluate the the entry we won’t be including a CFI-failing term and it won’t turn the main namespace into a temporary storage. — Ungoliant (falai) 19:12, 6 March 2014 (UTC)[reply]
Advantages to having them in principal namespace are direct access via normal search and tabbed access to citations. I also think that they are likely to be taken more seriously. Previously Appendix:List of protologisms was treated as a way of softening the blow of deletion of mainspace entries by new would-be contributors. We would need to create a tiered system to enjoy the benefits of both a class of 'serious' protologisms and a class of protologisms only present as a kind of consolation prize. DCDuring TALK 19:38, 6 March 2014 (UTC)[reply]
Appendix entries would still be accessible via the search, due to the template similar to {{only in}}. — Ungoliant (falai) 19:47, 6 March 2014 (UTC)[reply]
I have added a preliminary visual design to {{hot word}}. I'm open to suggestions. --WikiTiki89 19:39, 6 March 2014 (UTC)[reply]
Is protologism less clear than hot word, which we would have to define in our glossary and link to? Should we say 'This is a very popular, newly coined word that is likely to meet our criteria for inclusion in the future,but may not.'. DCDuring TALK 20:26, 6 March 2014 (UTC)[reply]
"hot word" is more exciting than "protologism". Yes, we can add it to our glossary and link it to there. I think your suggestion for the text is too long and wordy do be displayed in the template, but it would be a good fit for the glossary definition. --WikiTiki89 20:32, 6 March 2014 (UTC)[reply]
How about “This English term is a hot word and may be removed from Wiktionary in the future.” Using our terminology term, and indicating the language for clarity on multilingual pages. Maybe the template should be applied to a sense rather than an entry.
Graphically, it is much too hot. It looks more important than the main heading and all other content on the page. It only needs enough differentiation to draw the eye, and a treatment to differentiate it from other messages. But as a notice regarding the existence of an entry, perhaps it belongs above entry content, not as a sidebar on the right. Michael Z. 2014-03-06 21:06 z
I like your wording. As for it being graphically "much too hot", feel free to cool it down. --WikiTiki89 21:43, 6 March 2014 (UTC)[reply]
We also need a template for hot definitions. — Ungoliant (falai) 21:13, 6 March 2014 (UTC)[reply]
Yes, we should have a {{hot word-sense}} template. --WikiTiki89 21:43, 6 March 2014 (UTC)[reply]
{{hot sense}} makes more... sense? —CodeCat 21:50, 6 March 2014 (UTC)[reply]
Maybe. It doesn't make much of a difference, although it is more concise. --WikiTiki89 21:58, 6 March 2014 (UTC)[reply]
Created. See rolezinho for an example. — Ungoliant (falai) 22:14, 6 March 2014 (UTC)[reply]
I think instead of linking to the glossary, it may be more informative to create a dedicated Wiktionary:Hot words. That way we can dedicate more space to what they are, why we include them, and exactly in what way they are exempt from WT:CFI (that is, what qualifies them for deletion). —CodeCat 22:25, 6 March 2014 (UTC)[reply]
Appendix:Hot words might be better. --WikiTiki89 22:27, 6 March 2014 (UTC)[reply]
If olinguito doesn't pass CFI then it's CFI that needs to change. We don't need to add an ugly "hot words" box to its entry. It shouldn't need to pay penance. Let's start with removing the one-year requirement for broadly reported and attested scientific discoveries and mathematical concepts. For some neologisms, the "hot words" category could be a good idea, but not for olinguito, which is highly unlikely to be a flash in the pan as far as the word's usage goes. Pengo (talk) 06:24, 9 March 2014 (UTC)[reply]
That's the whole point of this discussion, to add provisions to CFI to allow words like olinguito. --WikiTiki89 06:52, 9 March 2014 (UTC)[reply]

This (language) term is a hot word, a new term that has quickly become popular. It may soon fall out of usage, and its entry may be deleted from Wiktionary in the future.

You can help this entry stay by establishing the word's usage over a significant period of time.

User:Wyangbot is applying for bot status (as continuation of a discussion last month)[edit]

Here. Please participate in the discussion and vote now. Wyang (talk) 03:46, 4 March 2014 (UTC)[reply]

Plural forms of proper nouns[edit]

Many proper nouns have well-attested plural forms; for example, Frances, Germanies, Caitlins and Jesuses. It seems to me that we could create a couple of templatised usage notes which would explain two major categories of proper noun plurals, namely:

  1. The plurals of personal names, which are used to refer to multiple individuals with the same name, and perhaps sometimes in the same way as placename-plurals (q.v.), as in "the two Obamas [the college-age one with X political views and the sitting president with Y political views] might not even recognise each other". And:
  2. The plurals of placenames, which are used when comparing or contrasting two historical incarnations, or two current or historical governments or social incarnations, of a place, e.g. "the border between the two Germanies", "unite the two Jerusalems", "John Edwards assailed the divide between the two Americas".

Some plurals would need more explanation than the template alone would provide — there is slightly more to say about the use of Germanies than about Frances/Estonias/Denmarks, and more to say about the use of Jesuses than about Caitlins/Barbaras/Annas/etc — but additional information could easily be provided along with or even in place of the templatised usage note.
What do you think? Would it be a good idea to have such usage notes? Where should they be placed: in France/Anna etc, in Frances/Annas etc, or in both places? What should the titles of the templates that contain the notes be? Some templatised usage notes exist in a "U:" 'subnamespace' like the "R:" one that our reference templates exist in; others are named in other ways. (Many languages use the plurals of personal and place names the same way English does, so it would seem inappropriate to use a language prefix.) - -sche (discuss) 22:43, 8 March 2014 (UTC)[reply]

Support. — Ungoliant (falai) 23:00, 8 March 2014 (UTC)[reply]

Codes the ISO has split or merged (second batch)[edit]

In 2012 and 2013, the ISO retired several codes by merging them into other codes or splitting them up. Thirty of these retirements appear to have escaped our notice. I posted Wiktionary:Beer parlour/2014/February#Codes_the_ISO_has_split_or_merged_.28first_batch.29 one batch here; here is the second batch, plus my thoughts on them; I'll post the rest another day. If you know a reason we should or shouldn't follow the ISO in a particular case, please comment! - -sche (discuss) 01:04, 9 March 2014 (UTC)[reply]

merging Xiandao into Achang[edit]

The ISO merged (xia) into Achang (acn). Xiandao is a 100-speaker dialect of Achang. A merger seems sound. - -sche (discuss) 01:04, 9 March 2014 (UTC)[reply]

 Done - -sche (discuss) 09:00, 28 March 2014 (UTC)[reply]

merging Panang into Amdo Tibetan[edit]

The ISO merged Panang (pcr) into Amdo Tibetan (adx); I propose we follow suit. "Panang" is merely the name of an Amdo-speaking group. - -sche (discuss) 01:04, 9 March 2014 (UTC)[reply]

 Done - -sche (discuss) 09:00, 28 March 2014 (UTC)[reply]

merging Sansu and Hlersu[edit]

The ISO merged Sansu (sca) into Hlersu (hle); we should follow suit, because Sansu is merely another name for Hlersu (per e.g. The Cambridge Handbook of Endangered Languages, →ISBN, 2011). - -sche (discuss) 01:04, 9 March 2014 (UTC)[reply]

 Done - -sche (discuss) 09:00, 28 March 2014 (UTC)[reply]

merging Piru and Luhu[edit]

Following those linguists who consider Piru to be a dialect of Luhu, the ISO merged Piru (ppr) into Luhu (lcq). (The ISO records this in a somewhat unclear way, but see here.) I propose we do likewise. Note that some literature takes the opposite stance and considers Luhu to be a dialect of Piru, but the end result for us — that we have a language with {"Luhu", "Piru"} as its names field — is the same. - -sche (discuss) 01:04, 9 March 2014 (UTC)[reply]

 Done - -sche (discuss) 08:52, 3 April 2014 (UTC)[reply]

merging Talur into Galoli[edit]

The ISO merged Talur (ilw) into Galoli (gal). Talur is indeed a dialect of Galoli. - -sche (discuss) 01:04, 9 March 2014 (UTC)[reply]

 Done. - -sche (discuss) 23:24, 5 April 2014 (UTC)[reply]

(Southern) Yamphe/Lorung[edit]

The ISO merged Yamphe (yma) into lrr, renaming that code from Southern Lorung to Southern Yamphu. A paper published in the Australian National University's Papers in South East Asian Linguistics in 1997 describes the situation: "With two dialects, northern and southern, the Lohorong or Lorung language forms part of the Lohorong-Yamphe group. The Yamphu language occupies an intermediate position in its subgroup between Lohorong, Yamphe and southern Lohorong." (Sic!) - -sche (discuss) 01:04, 9 March 2014 (UTC)[reply]

 Done. - -sche (discuss) 21:50, 2 November 2015 (UTC)[reply]

Appendix:Orphaned words could use some TLC. Half the information is in hidden comments in the source of the page. The term "Orphaned word" seems not to be used elsewhere, and perhaps it should be called unpaired words or "cranberry morphemes" or "fossilized terms" as it seems to contain all of the above. If it weren't just seven words long I'd suggest creating an appendix for each of the above. In order to clear up confusion and stop them being added again, false examples should be listed as such, rather than be deleted (e.g. where the apparent etymology is different to the actual, or where compounding happened in another language). The centered table format seems out of place on Wiktionary. And the list seems like it could be much longer.

Anyone want to have a go at adding terms to it, renaming it, reformatting it, splitting it up, turning it into a category (or categories), merging with the list on Wikipedia's Unpaired word page, or generally just cleaning it up? Pengo (talk) 05:55, 9 March 2014 (UTC)[reply]

This script displays a notice at the top of a user's page and contributions list showing whether they are blocked. However, MediaWiki already displays this information by default, even showing the block log entry — except when viewing an existing user page. Therefore loading and executing this script is mostly a waste of bandwidth. Can we disable it? Keφr 07:24, 9 March 2014 (UTC)[reply]

How about simplify it so it only takes up the software's slack?​—msh210 (talk) 17:04, 16 March 2014 (UTC)[reply]

A question.[edit]

Exactly what is this {{rfc-header|Perfective Counterpart|lang=ru}}? After organizing the verb forms like звать, I really wonder about this --KoreanQuoter (talk) 08:03, 9 March 2014 (UTC)[reply]

The bot (Kassadbot) which enforces consistent formatting applies {{rfc-header}} tags to entries that have nonstandard headers. Which entry did you see this tag in? If there was just a tag, and no corresponding header, you can simply remove the tag. If there was a header ===Perfective Counterpart=== in one of the entries, it needs to be changed to a standard header. I would have to look at the entry before speculating about which header it should be changed to. - -sche (discuss) 18:08, 13 March 2014 (UTC)[reply]
It means that (as in зовём), you made this header: ====Perfective Counterpart====. We do not use that header. зовём should only link back to its infinitive, звать. Under звать it will list its perfective counterpart позвать. So, what {{rfc-header|Perfective Counterpart|lang=ru}} means is that you must remove that section. —Stephen (Talk) 18:24, 13 March 2014 (UTC)[reply]
I have fixed all non-standard sections. @KoreanQuoter please see my edits, e.g. зовём. --Anatoli (обсудить/вклад) 04:20, 31 March 2014 (UTC)[reply]

Links to Wikipedia in reference templates[edit]

Hello all. Dan Polansky and I are currently in disagreement about whether to link to Wikipedia from reference templates (the ones that start with R:…). Take {{R:L&S}} as an example: I prefer this version, whereas Dan Polansky prefers this version. There are other differences between those two diffs, but the salient bone of contention is whether or not to link to w:A Latin Dictionary in the template code.

I assert that reference templates should link to the relevant cited authority's Wikipedia article; that way, an explanation for why the source is being cited as an authority is readily available for the sceptical reader on the other side of the link. Dan Polansky maintains that such links are distracting and inessential; the latter because a reader can copy the name of the reference work and paste it to Wikipedia article box thereby finding the relevant article.

You can see the (short) discussion so far at User talk:Dan Polansky#Re linking in reference templates. As far as I can tell, Atelaes and I support such linking (i.e., single links to the cited authority's Wikipedia article only), whereas Dan Polansky opposes such linking. I come here to try to obtain consensus for such linking. What do others think? — I.S.M.E.T.A. 19:39, 9 March 2014 (UTC)[reply]

P.S.: @Atelaes, Dan Polansky I have tried to represent your views faithfully; please post corrections hereto if you feel I have misrepresented either of your positions.

Support linking. If a person doesn’t want to know more about the work being referenced, they can simply not click the link. — Ungoliant (falai) 19:49, 9 March 2014 (UTC)[reply]
I have no general opposition to linking, but linking to the Wikipedia article about a dictionary is not very useful. If it were a citation of a novel or something, that would be a different story and I would support linking, but I'm not sure what exactly the difference is. So I'm undecided. (Having said that, I definitely oppose this version). --WikiTiki89 20:13, 9 March 2014 (UTC)[reply]
Support linking to WP or the home page of the reference website, if it is online and WP doesn't have any article. I try to do this routinely.
We have a number of templates that contain two links, one to a webpage that has substantive information relevant to the headword and one to some page that explains the source in some way, either at WP or a source site page such as its home page or "About us". Having two links increases the possibility for confusion. We seem to have all possible kinds of user preferences and behavior with respect to such links in unknown proportions: not being able to find and follow blue links even when they would help, finding them distracting/following them accidentally, hitting the source-site link rather than the content page link, as well as figuring out which link is relevant to one's needs and using it appropriately.
This discussion makes me wonder whether we should reduce the size of the source link by not having the entire site title be clickable. That would seem to reduce the likelihood of following the source-explanation link when one wanted the content link. I don't see why cut-and-paste should be required for a user to satisfy a question about the source of some information, especially if there is a WP article on the source, as there often is in my experience. DCDuring TALK 20:38, 9 March 2014 (UTC)[reply]
It follows a fortiori from the argument about confusion where there are two links that I agree with Wikitiki that the version of the L&S template he cites, with its multiple links, is unsatisfactory, as it provides even more opportunities for mistakenly following links of minimal relevance to identifying and evaluating the source of the substantive information. DCDuring TALK 22:59, 9 March 2014 (UTC)[reply]
  • Oppose linking to a Wikipedia article from the reference work name. The reference template should focus the follow-link behavior on the sole link, which takes the reader directly to the page where they can find more about the word, rather than the reference work. Wikilinks are typographically inferior to black text, IMHO, so they should only be used where they add real value. Thus I prefer
    I also oppose the extraneous "New York: Harper & Brothers" added by the user in diff, whose edit summary does not indicate other changes than linking. --Dan Polansky (talk) 19:00, 10 March 2014 (UTC)[reply]
  • Support linking to a specific content page and some page that explains the source. DCDuring TALK 18:51, 13 March 2014 (UTC)[reply]

References on the page should follow the same format as references for quotations, as much as possible.

If we format references consistently, then it is less confusing to link to Wikipedia articles for titles or authors (I’d suggest limiting this to one link, to the author only if there is no article about the work). Why not use the visible shortcut to show the nature of the interwiki link?:

We do not want to link to Wikipedia articles about every mentioned city, publisher, etc. Linking every word is distracting. These are of secondary relevance to a citation, and to be found in a Wikipedia article about the work. Michael Z. 2014-03-13 19:18 z

I certainly oppose this much linking. I tend to oppose other linking (e.g. linking of author names and work titles) in reference templates, too, because users may click on the work-title links expecting to get to the cited page in the reference, and then have to backtrack. (On File pages that say "This is a file from the Wikimedia Commons. Information from its description page there is shown below.", I have on an embarrassingly large number of occasions been distracted enough to click on the "Wikimedia Commons" link and then have to backtrack and click the link to the actual file.) Linking author and/or work names when quoting a book under a sense, the way Thomas Jefferson is linked in [[liberticide]], is more acceptable, because I think users are less likely to click the links expecting to get to the citation, since they can already see the citation (quoted on the very next line). - -sche (discuss) 21:59, 13 March 2014 (UTC)[reply]
How about this format for the source link:
—This unsigned comment was added by DCDuring (talkcontribs) at 22:38, 13 March 2014.

I take on board the general opposition to overuse of wikilinking; I apologise, for I hadn't realised how distracting and potentially confusing others find it, because I don't myself find such linking particularly distracting nor at all confusing. Re the proposed alternatives to this version:

  • @DCDuring ("reduc[ing] the size of the source link by not having the entire site title be clickable" and the Rhipidistia/Palaeos format proposed above in your post timestamped: 22:38, 13 March 2014): Unfortunately, I think that your format would be more likely to cause users to follow the incorrect link (or at least, I believe it would if most users are anything like me). I say that because, when I see a blue link with either or a padlock (or whatever) beside it, I correctly interpret it to be an external link, whereas when I see a simple blue link, I interpret that as an internal (i.e., intra-Wikimedia) link (with dark blue for Wiktionary links and light blue for links to other MW projects); that, I expect, is what the software's developers intended. Although there are some links to sources that point to Wikisou rce, most source links on the English Wiktionary are external links (usually to Google Books or Usenet). The fact about sourcing here and the software developers' intent interact in the mind to produce in me (and, I assume, in others) the assumption that a link with or a padlock (or whatever) beside it is the source of whatever I'm looking at/for. (And it seems as if that assumption is also behind the recommended use of {{usex}}'s ref= parameter.) For me, the format you give would temporarily confuse me, until I'd hovered over the links to guess by their URLs the content of what they're pointing to. (Pace @-sche your confusion-causing example concerns two plain blue, internal links, and not one internal link with one external link; I suspect you would be far less likely to make the same mistake were you to be faced with the latter combination.)
    • We already use fullurl links abundantly in our discussions when referring to entry revisions or diffs. Per a vote sponsored by DanP some time ago, we treat links to our sister projects under "External links", rather than, say, "See also". That you haven't been reeducated means you must have been following links on our more updated Translingual entries. DCDuring TALK 20:46, 22 March 2014 (UTC)[reply]
  • @Dan Polansky ("I…oppose the extraneous 'New York: Harper & Brothers' added"): Come to think of it, it seems pretty unnecessary to mention the publisher at all; what do you say we remove the "Oxford: Clarendon Press" bit, too?
  • @Mzajac ("format[ting] references consistently" and "Why not use the visible shortcut to show the nature of the interwiki link?"): I agree with you regarding what I take to be your point about consistent formatting of references: that they should conform, as closely as analogously appropriate, to the standard format(s) laid out in WT:", yes? If that is indeed what you mean, why did you put the publication date immediately after the authors' names? For consistency with WT:", the date should be included in parentheses immediately after the publisher's name (see the citation from Treasure Island under WT:"#Between the definitions). As for including the ISBNs, they automatically create blue links, which many in this discussion have opposed, and they are of very marginal usefulness; I suggest we not include them for those reasons. I confess that I dislike "w: A Latin Dictionary", although largely on aesthetic grounds; would something like "w:A Latin Dictionary" be acceptable?

In the light of the discussion so far, I suggest one of the two following formats for, e.g., {{R:L&S|via|via}}:

What do y'all think? Also @Ungoliant MMDCCLXIV, Wikitiki89 — I.S.M.E.T.A. 19:37, 22 March 2014 (UTC)[reply]

  • We really should provide links to information about the sources, so readers can better assess their reliability- even a great work like Lewis & Short has suffered from changes in scientific names and terminology over the past century or so, and there are some references that are great in some areas and downright awful in others.
I'm not happy with either format, though. We should have "(see A Latin Dictionary at Wikipedia)" set off from the rest, not an unlabeled bluelink that might be misinterpreted as a link to the main page for the dictionary at Perseus. If we don't waste space on trivia such as the publisher, there should be plenty of room to spell out the basics- especially since the template is doing all the typing for us. Chuck Entz (talk) 21:29, 22 March 2014 (UTC)[reply]
@Dan Polansky: WT:" prescribes the parenthetic format. I'd be happy for the date to be included after a comma instead.
@Chuck Entz: Hmm. What about this format?:
  • via” in Lewis & Short’s Latin Dictionary (1879) — For more information on this source, see its Wikipedia article.
 — I.S.M.E.T.A. 14:17, 23 March 2014 (UTC)[reply]

Stop treating Nynorsk and Bokmal as languages separate from Norwegian[edit]

Previous discussions: March 2008, July 2008, February 2011, January 2012, August 2012.
On March 20th, Wiktionary:Votes/pl-2014-03/Unified Norwegian will begin so that we can (hopefully) formalize a policy on this oft-discussed subject.

At present, we treat all three of "Norwegian" (code no), "Norwegian Bokmål" (nb) and "Norwegian Nynorsk" (nn) as languages. We have 5800 ==Norwegian== entries, 4800 ==Norwegian Bokmål== entries and 7400 ==Norwegian Nynorsk== entries. I and several others think we should stop treating Bokmål and Nynorsk as languages separate from Norwegian.
As was pointed out in a previous discussion, Nynorsk and Bokmål are two standards of Norwegian, but there are other standards (e.g. Riksmål) and many dialects whose words, because they cannot be labelled Bokmål or Nynorsk, would ironically be the sole users of the plain ==Norwegian== header if we were ever to seriously consider Nynorsk and Bokmål separate languages (but in fact most of the words which currently use the ==Norwegian== header are acceptable in both, or sometimes just one, of the standards).
Bokmål and Nynorsk are mutually intelligible (see e.g. Rubén Chacón-Beltrán's Introduction to Sociolinguistics, page 135, and Joshua Fishman and Ofelia Garcia's Handbook of Language and Ethnic Identity, page 434). They are no more different than US English and Indian English or txtspk or any of the other forms of English we handle very well with context tags rather than separate L2 headers.
In a previous discussion, someone (I don't recall who) made the point that there also exists a degree of mutual intelligibility between Norwegian and Danish. The person who made this point seemed to think it constituted an argument against merging Nynorsk and Bokmål. I dismiss this slippery slope fallacy. If anyone wants to propose merging Norwegian and Danish, they can start a section about that below this one, and the merits of it can be discussed, but the question I ask in this section is: "should we stop treating ==Norwegian Bokmål== and ==Norwegian Nynorsk== as languages separate from ==Norwegian==?" - -sche (discuss) 03:56, 12 March 2014 (UTC)[reply]

Confirming my support for the merge. --Anatoli (обсудить/вклад) 04:29, 12 March 2014 (UTC)[reply]
  • Support. From my understanding, this is similar to treating people-who-write-color and people-who-write-colour as speakers of separate languages. --WikiTiki89 04:49, 12 March 2014 (UTC)[reply]
    I don't know how carefully you chose your wording, but it's particularly apt: it is like treating people who write "i 1877 forlét Brandes København" vs "i 1877 forlot Brandes København" as if they were speakers of separate languages, lol. (Meanwhile, no problems have arisen from our treatment of people who write "nevermind, I realised I have to go pick up Andy; I'll see you later instead" and people who write "nvm. realized i gtg get Andy. cu l8r" as users of a single language...) - -sche (discuss) 02:01, 13 March 2014 (UTC)[reply]
    My wording was a highly condensed form of a rant about how Canadian English would the same language as British English, but separate from American English. Sometimes I'm careless with my wording, but this wasn't one of those times. --WikiTiki89 06:29, 13 March 2014 (UTC)[reply]
For what it's worth, we've already had this discussion, here, among other places. I strongly suggest people interested in the topic read some. I don't have a strong view myself, as I have next to no knowledge of Norwegian, but my impression is that most Norwegian contributors tend to support the split. Perhaps we can, at the very least, be a bit more civil about it this time around. -Atelaes λάλει ἐμοί 06:19, 12 March 2014 (UTC)[reply]
Bokmål and Nynorsk are different enough to be considered separate languages, because they are also different enough from Swedish and Danish. It's true that Nynorsk is skewed more towards the west of Norway while Bokmål is closer to the urban speech of the east, but that's only kind of relevant, because in reality both standards claim to be written representations of the full range of speech in all of Norway. That is, the aim of both is to be a single unified language for all Norwegians, not merely to represent some part of them. So even if they have a different dialectal base, they are not regional dialects of Norwegian; they're both Norwegian, period. That means IMO that they can't be treated as true languages; languages have phonology, and pronunciation, but Bokmål and Nynorsk don't have anything to do with speech. That in itself must mean that they are not languages, because they are written standards only. The situation is really more like that between ekavian and ijekavian Serbo-Croatian, where one is free to choose whichever. And similar also to Traditional and Simplified Chinese characters, or between Latin and Cyrillic Serbian, where different speakers within the same region might decide to use different forms depending on their preference.
Some Norwegians do claim to "speak Bokmål" or "speak Nynorsk", but what really happens there, as far as I know, is that people simply follow the vocabulary and idioms that are part of one standard or the other. The standards are fairly strict in that regard, they also tell you which words to use! But that could be considered a spelling pronunciation, and I doubt that people do this in everyday life; they'll just speak the local dialect without regard to the Bokmål-Nynorsk division. That of course shows the reality of the situation: as far as everyday speech goes, there's really no such thing as Bokmål or Nynorsk, but there's no "Norwegian" either. There's just varieties as spoken in Norway. Plenty of people will use words that are proscribed by (not included/sanctioned in) either standard. When reading Bokmål out loud, their pronunciation might actually be closer to the Nynorsk spelling, or it might match neither one closely.
So really, when we split these into separate languages, we end up with the confusing situation in which both language forms always have the same range of dialectal pronunciations and must therefore have duplicated, identical pronunciation sections. The split also leaves out any forms that are part of neither standard, which Wiktionary must include as part of its NPOV policy. We can't simply decide that all Norwegians must speak either Bokmål or Nynorsk, and whatever doesn't fit those can't be included. I think that implies that we must by necessity have a language header to account for forms found in neither standard. So that really gives us only two possible ways to handle the situation:
  • Three headers: Bokmål, Nynorsk, and a third language header for whatever words are left out of either one. This would create duplication of all pronunciation information.
  • One header: Norwegian, which encompasses both standards, as well as anything that falls outside them.
To me, the choice is not that hard. —CodeCat 14:55, 12 March 2014 (UTC)[reply]
Yeah, I never really understood the division when context tags can handle this perfectly well. --Æ&Œ (talk) 18:38, 12 March 2014 (UTC)[reply]
  • Support, this presents a clear-cut case for a single language with similar dialects. We might as well split ==English== into ==Deep South== and ==California==. bd2412 T 19:08, 12 March 2014 (UTC)[reply]
  • I too support merging Nynorsk and Bokmål back into Norwegian. They're certainly at least as similar to each other as the various standards of Serbo-Croato-Bosno-Montenegrin are, if not more so. —Aɴɢʀ (talk) 19:07, 12 March 2014 (UTC)[reply]
  • Expanding on my earlier comment: Nynorsk and Bokmål are two of the prescriptivist written standards of Norwegian; they include many, but far from all, Norwegian terms and spellings. Some Norwegian words and spellings fell out of use before Nynorsk and Bokmål came into existence, others are not accepted by either standard but are still used (e.g. in dialects), and others are prescribed by other standards, e.g. Riksmål, the standard used by Norway's largest newspaper, Aftenposten. Our descriptivism and NPOV require that we include all these words.
    If we consider Norwegian to be a language, this is simple to do: we include all the words under a Norwegian header, and use whatever context tags are appropriate: {{cx|dialectal}}, {{cx|Nynorsk}}, {{cx|obsolete}}/{{obsolete spelling of}}, {{cx|Bokmål}}, {{cx|Riksmål}}, etc.
    If, on the other hand, we consider Nynorsk and Bokmål to be separate languages, things get harder:
    - Whenever a term or sense is attested, but we cannot determine if it is acceptable in Nynorsk or Bokmål, we must have a ==Norwegian== section. Readers may expect the ==Norwegian== section to document all the uses the term has in Norwegian, but it may document only the term's obsolete or nonstandard uses.
    - In the many cases that a term/spelling is used in both Nynorsk and Bokmål, we must duplicate content in a ==Norwegian Nynorsk== section and a ==Norwegian Bokmål== section on the same page. If the spelling is also used in Riksmål, we must document this as well... apparently by having a third section on the page, namely a ==Norwegian== section with a {{cx|Riksmål}} tag.
    - If the prescriptivist body which regulates Nynorsk (or Bokmål) deprecates a particular term or spelling, or newly allows a previously unacceptable term or spelling, we must change the language we consider the term to be based on the prescriptivist body's decree. We cannot determine the language of a term we encounter in the wild without referring to the prescriptivist body which governs that "language". This is fundamentally incompatible with our descriptivism and policy of NPOV. We would never strip the ==French== header off an attested French word or spelling just because the French Academy deprecated it. - -sche (discuss) 06:15, 13 March 2014 (UTC)[reply]
    PS, in previous threads it was noted that the ISO has granted Nynorsk and Bokmål their own codes: the ISO has also granted codes to many other lects which are not independent languages, listed here, and to at least one "language" that doesn't even exist, vmf/"Mainfränkisch". - -sche (discuss) 23:28, 13 March 2014 (UTC)[reply]
  • Is it necessarily true that Nynorsk and Bokmål are, by definition, prescriptive standards? I mean, there exist real works that are written using those standards; is it necessarily true that, no matter how many Nynorsk or Bokmål works a word appears in, it's only actually "Nynorsk" or "Bokmål" if the Norwegian Language Council endorses it as such? (I've heard people make similar claims about other languages — e.g., saying something is "not French" if it's not Academically recognized — and usually we're quite happy to ignore such prescriptivist poppycock. What about Nynorsk and Bokmål will forcibly compel us to heed the Council?) —RuakhTALK 00:35, 15 March 2014 (UTC)[reply]
  • We agree that we should not not strip the ==French== header off an attested French spelling just because the Académie does not accept it. If, however, we decide to have a marker (say, a header or a tag like {{cx|Académician French}}) indicating a spelling's presence in the Académie's standard, we should not apply it to spellings the Académie does not accept. Such spellings should have only the basic header (==French==) and not an {{cx|Académician French}} tag. Right?
  • In my view, our ==Norwegian== header corresponds to that scenario's ==French==, and we have headers for two specific standards (namely ==Norwegian Bokmål== and ==Norwegian Nynorsk==) that correspond to {{cx|Académician French}}. As with French spellings, I think Norwegian spellings which are not in the Bokmål and Nynorsk standards should have only the basic header (==Norwegian==), not Bokmål or Nynorsk tags or headers.
  • Just as the appearance of a particular word/spelling (say, *défoobaristiqué) in French books which otherwise use Académie-approved spellings does not override the Académie's rejection of défoobaristiqué and make it Académician French, the appearence of a word/spelling (*enfoobarig) in books which otherwise use Nynorsk-approved spellings does not, in my view, make enfoobarig Nynorsk.
  • Now, one might take a different view, and think ==Norwegian Bokmål== and ==Norwegian Nynorsk== are more like {{cx|US}} and {{cx|UK}}, i.e. dialects, than written standards. Under such a view, it would be more intelligible to see a word's presence in a "Nynorsk work" as indicating it to be Nynorsk. However, (1) CodeCat does a good job of explaining why Nynorsk and Bokmål "are written standards only", not true languages (or, I would add, true dialects). And (2) one would still have to refer to the prescriptive body's list of which words are Nynorsk—or how else would one determine whether the other spellings in the work were Nynorsk or Bokmål, in order to infer whether the work, and thus enfoobarig, was in Nynorsk or Bokmål?
  • (And so, because we're using L2 headers to do what should be done with context tags, [in my view] we reach the unusual position of having to change a term's L2 header based on a prescriptivist body's rulings. Changing a context tag based on prescriptivists' rulings is, in contrast, fairly common: if, for example, authorities proscribe a term, and we notice, we often add a {{cx|proscribed}} or {{cx|sometimes|proscribed}} tag and (on our best days) explanatory usage notes.)
  • - -sche (discuss) 19:54, 15 March 2014 (UTC)[reply]
  • I guess I'm just missing the step from "this is a written standard" to "this is wholly owned and operated by a standards body". I mean, U.S. English also has a written standard, distinct from any spoken dialect, but no single entity defines that standard, and we don't find ourselves preternaturally incapable of identifying works written in it. (My point being — if you're correct that Bokmål and Nynorsk are really nothing but prescriptivist figments, and not language varieties that people actually write in, then not only should they not be language headers, but they probably shouldn't even be context tags.) —RuakhTALK 00:24, 16 March 2014 (UTC)[reply]
  • support I knew nothing about Bokmål, Nynorsk, and Norwegian before from reading the above. I just want to point out that nb, nn, and no are the only languages on Wiktionary that tangle together in our Category hierarchy. E.g. nb:Sciences and nn:Sciences are listed under no:Sciences. Having written a script to parse the category tree, I can say with some authority that no other languages have this kind of structure on Wiktionary. I've previously wondered why no/nb/nn do this. Thanks all for explaining. It would be nice if Norwegian had the same category structure as every other language. Pengo (talk) 08:19, 13 March 2014 (UTC)[reply]
  • To ensure that they notice and can join this discussion (if they want to), I now ping recently-active users who often participate in discussions of lect/code mergers, and/or indicate proficiency in Norwegian. User:Njardarlogar, User:Teodor605, User:Metaknowledge, User:JorisvS, User:Liliana-60, User:MaEr, User:LA2. Anticipating that some may feel that this issue, having been discussed before, should not be discussed again, I point to the number of users who participating in this thread who were not involved in previous threads, the new arguments presented in this thread, the unanimity of the users who have participated so far, and the fact there will be a vote after this thread runs its course, which will hopefully finally result in an actual policy. - -sche (discuss) 21:44, 13 March 2014 (UTC)[reply]
I've asked a question on the the vote's talk page about how grammar differences should be handled after a merge. Please comment. --Anatoli (обсудить/вклад) 01:28, 14 March 2014 (UTC)[reply]
  • support. I was just notified of this discussion by - -sche. My native language is Norwegian (Bokmål ) and I speak a few other languages. I must say that my personal view is that these two languages ideally should be treated as one. But that is a practical view on it. As most contributors have pointed out, the differences between the two are small. I definitely support the use of {{cx|Nynorsk}} and {{cx|Bokmål}} tags. If you drop that altogether, you would definitely make a mess and create a lot of confusion.
However, this is quite intricate. One would ideally also need a {{cx|Riksmål}} and a {{cx|Nonstandard}} tag. Riksmål is really a subset of Bokmål. And you also need a tag that allows all the permutations of Bokmål, Nynorsk and Riksmål. I.e. (N/B/R), (N/B), (B/R) and theoretically also (N/R). You also need a tag to point out if a word lacks adequate information, In fact, that is how we have decided to do things on the Norwegian project, if not for an altogether different reason. The number of active contributors on either project is small, so instead of duplicating our effort we decided to join forces.
The general sense here in Norway, however, is to keep the two languages apart. The topic is politically quite sensitive, and there are plenty of zealots on either side. Had there been more active users on the Norwegian project I am sure we would have kept the two languages as separate projects. The grammar in the two subsets of the same languages vary quite a lot and there are lots and lots of words that are spelled differently. It is a very time consuming effort to create grammar templates that cater for all needs.
So, to sum it up, I think it is too easy to dismiss the two as one language. However, for practical reasons I think they should be treated as on non-Norwegian projects. I sincerely hope other Norwegian native speakers will also have their say. I don’t want to be the sole voice of the ones who actually speak this language.
  • Might I also take this opportunity to ask for help on the Norwegian project itself? We are simply too few active contributors to progress in a speedy fashion. We need help with some of the features that bigger projects have implemented long ago. E.g. making translation templates that helps a non-proficient user to quickly add a translation and to add that translated word in the Norwegian project itself. And helping out with programming some smart features like words lacking important sections like a grammar section, etc. I love some of the gadgets I have seen elsewhere but lack the skill to implement them. --Teodor (dc) 00:18, 15 March 2014 (UTC)[reply]
  • Note: The fact that the two things are mutually intelligible is far from sufficient for them to be the same language; Czech and Slovak are mutually intelligible. The claim that they are one language needs a better support. --Dan Polansky (talk) 08:09, 15 March 2014 (UTC)[reply]
    • I already gave this support. Czech and Slovak represent distinct dialect bases and nobody would claim that either one could cover the whole Czech-Slovak continuum. But with Bokmål and Nynorsk, there is no such geographical "tie": they are both used scattered throughout Norway. It would be as if written Czech and Slovak were both used in both countries, and each person would be free to choose which one, regardless of what dialect they actually spoke, and we'd call them "Knižný Jazyk" and "Nový Československý". —CodeCat 14:20, 15 March 2014 (UTC)[reply]
  • Keep separate. Although a lot of words are the same in Bokmål and Nynorsk, as well as Danish and Swedish, there are differences in the inflections between the two languages, especially the plural inflections; only in very few cases are the inflections for nouns completely the same, such as for snømann, or mass nouns. In some cases the noun gender can differ; a feminine noun in Nynorsk is usually both masculine and feminine in Bokmål, probably a compromise between Riksmål, which doesn't have a feminine gender, and Nynorsk.
    I have gradually been eliminating the Tbot entries, determining whether each one is Bokmål, Nynorsk, or both. Any new entries I make are split between the two languages where appropriate. It should also be pointed out that there are separate Wiktionaries for Bokmål and Nynorsk, so why should the English Wiktionary differ from this? There are also separate Wikipedias for both languages. Donnanz (talk) 19:50, 15 March 2014 (UTC)[reply]
    There is no problem at all in having separate inflection tables for different standards, nor for having different genders for different standards. This is not an obstacle to merging them. --WikiTiki89 19:57, 15 March 2014 (UTC)[reply]
    [edit conflict] I don't understand why separate inflections is an argument for keeping them separate. Look at the inflection tables for Catalan verbs on the Catalan Wiktionary for example (ca:cantar). They contain quite a few different alternative forms, depending on dialect; some are part of a standard (Catalan has multiple standards, like Norwegian), some are not. Our current inflection tables don't cover all of these standards, but they probably should at some point. That is not a reason to separate Catalan into multiple languages of course, so it can't be a reason to do that for Norwegian either. All it means is that we need to include more than one inflection table in some Norwegian entries, but that's hardly a problem. As for gender, I don't see how that's a problem either. Norwegian is not the only language where nouns may have ambiguous gender; Dutch is another example, and there are more. We haven't had any trouble with handling those cases, so again we wouldn't expect trouble in this case. Furthermore, you haven't addressed any of the issues that arise from keeping the languages separate (like the question of nonstandard forms), which are far more disruptive. —CodeCat 20:03, 15 March 2014 (UTC)[reply]
  • Re separate wikis: There are also separate Serbian and Croatian Wiktionaries and Wikipedias, yet en.Wikt found it best to have one Serbo-Croatian language rather than several. Re different inflections: English and German also have words which inflect differently, or (in the case of German) have different gender, in different varieties of the language. Joghurt#German, with its multiple genders and inflection tables, shows how this can be handled. - -sche (discuss) 20:06, 15 March 2014 (UTC)[reply]
  • It is both true and false that there is both a Bokmål and a Nynorsk Wiktionary, as Donnanz claims. The nn project is dead. Check the number of non-bot contributions over the last three years... There was a vote, in 2008 or 2009 I think, to merge them, and all of the nn content was merge into the no project. But that was because there weren't enough active users, especially on the Nynorsk project. I wasn't contributing much at that time so I didn't take part in the discusion. The no project now uses tags to mark nb, nn, Riksmål, and also non-standard. I agree with Njardarlogar when he says there are no dictionaries that treat Nynorsk and Bokmål as one language. If we have the resources to keep them separated I support his view. However, I fear that will just lead to poorer quality, especially when it comes to Nynorsk. The number of active users of Nynorsk is steadily declining, and even in the Norwegian project itself it is hard to find contributors willing to put in the necessary time to check grammar, make grammar templates accomodate Nynorsk inflections etc. Is it very likely that the English project would fare any better? --Teodor (dc) 20:32, 16 March 2014 (UTC)[reply]
  • Strong oppose It would be hypocritical to merge the two versions of Norwegian while keeping the while keeping Danish, Norwegian (Bokmål + Nynorsk) and Swedish separate (because that merger is not likely to happen in the near future, is it? Being pragmatic here). I have yet to encounter a dictionary that treats Bokmål and Nynorsk as the same language, just as I have yet to encounter a dictionary that treats Danish, Norwegian and Swedish as one language (even though they are; check this list to get an idea).
We don't have separate headers for e.g. British English and American English because they are hardly ever treated as separate languages (there is no good reason for that they should be, either). British and American English have the same origin; they share a common core. Nynorsk and Bokmål do not. Bokmål evolved gradually from Danish, while Nynorsk was created from scratch based on rural Norwegian dialects; so the two standards are not as related as one could initially be led to believe (think of convergent evolution in biology). Again, please take a look at the list I just linked to to get an idea of what we are dealing with. It is in no way such that Bokmål + Nynorsk = Norwegian while Danish and Swedish are entirely different beasts; and the list will give you a strong indication of this. --Njardarlogar (talk) 23:28, 15 March 2014 (UTC)[reply]
That still doesn't change the fact that Bokmål and Nynorsk are both standards intended to be used by speakers of all dialects in Norway. That makes them fundamentally different from Danish vs Swedish vs Norwegian. The Swedish standard language is only intended to be used by self-declared Swedish speakers in Sweden, not by Norwegian speakers in Norway. Thus, even though there are quite some differences between Bokmål and Nynorsk, the basic fact is still that they are both used in all of Norway, and there is therefore no strong correlation between location and the standard used. Two neighbours speaking the same local dialect in Trondheim might write in Bokmål and Nynorsk respectively. That, to me, makes all other arguments about their supposed differences, and comparisons with other Scandinavian dialects, irrelevant. They are two standards for the same set of dialects; i.e. those spoken in Norway. Those dialects should therefore be grouped together as "Norwegian".
Aside from that, you have not addressed any of the problems that having three (or more!) headers for each separate standard creates. Just read through this discussion and you'll get an idea. I and others have already argued that having just 2 headers is untenable with respect to NPOV, and 3 or more headers creates far more problems than it solves. No matter how many standards we treat as separate languages, we still eventually end up having to need one extra language name to cover anything that's not part of any standard. The practice of treating language standards as normative for what can or cannot be included in Wiktionary under a particular header also goes completely against all practice in this area so far. The names "Bokmål" and "Nynorsk" are by definition prescriptive. So unless you provide solutions for the problems noted in this discussion, your oppose vote really does not accomplish anything. It basically becomes "yes, we have problems, but what about my principles!". —CodeCat 23:47, 15 March 2014 (UTC)[reply]
I agree that having three headers is a rather horrible solution that could create a bad precedent in terms of headers cluttering up the pages. At the same time, there already are thousands of living languages on this planet, so a header or two extra doesn't move us into a new league when it comes to number of headers.
I don't see why we should have more than 3 headers. There are only 3 separate codes for the Norwegian language, and there are only 2 normally recognised standards of Norwegian, namely Bokmål and Nynorsk. There are two more variants of written Norwegian, and those are Riksmål and Høgnorsk. Riksmål is technically a subset of Bokmål (plus perhaps a few extra words/spelling variants that are hardly ever found in contemporary Norwegian texts. Also, see which language the Riksmål Dictionary is described as being written in in one of the biggest online Norwegian bookstores..). Høgnorsk is very uncommon, and is not a standard. Riksmål can be treated under the Bokmål header while Høgnorsk can be treated under the Nynorsk header.
Yes, we'd solve a problem with two headers. But we'd create new problems with the tags. How exactly would that work? Should every Norwegian meaning be tagged so that the reader is never in doubt whether the editor simply forgot to add the tag or is not familiar enough with Bokmål and Nynorsk to know that this meaning or word is only found in one of the two standards? And what about words that are considered dialectal in one standard and ordinary in another? (Bokmål 2 vs Nynorsk 1; hin is another example, 2 vs 1). Then there's Riksmål and Høgnorsk again. Should one expect an editor to be sufficiently familiar with these two minority variants in order to tag all entries correctly? Should only Riksmål and Høgnorsk words/spellings that are different from Nynorsk and Bokmål be tagged; or would we have to tag all words that can be said to be Riksmål and Høgnorsk?
How relevant is it really that Nynorsk and Bokmål are aimed at Norwegians and not Danes and Swedes? It doesn't sound very descriptive to mention this as an argument. I could write in Swedish and my fellow Norwegians wouldn't have any major issues with understanding what I wrote. Likewise with Danish. Danish used to be the official (de facto or otherwise) language in Norway for centuries, anyway, so it's not a completely theoretical objection. On board Norwegian airliners, if the captain is Danish or Swedish, the captain will give information first in his native Scandinavian variant, then in English. Just like Norwegian pilots give information first in Norwegian, then in English.
Then there's also the important fact that despite that both Bokmål and Nynorsk are supposed to be used by all speakers of all Norwegian dialects, they still have very different origins. Bokmål stems from Danish, while Nynorsk stems from rural Norwegian dialects. So by the argument you are providing, if suddenly one day Swedish started to be popular as a written language in Norway, and some proponents of it meant that all Norwegians should write it, then suddenly Swedish is Norwegian; which linguistically is nonsense.
As for being prescriptive; you have to realise that there is no Norwegian language. There is a Mainland Scandinavian language with 4 official written standards: Bokmål, Danish, Nynorsk and Swedish. It's no more prescriptive to label a word as either Bokmål or Nynorsk than it is to label it as Swedish or Danish.
-sche writes We cannot determine the language of a term we encounter in the wild without referring to the prescriptivist body which governs that "language". This is fundamentally incompatible with our descriptivism and policy of NPOV. And indeed, this goes for all of Scandinavian, not just Bokmål vs Nynorsk. --Njardarlogar (talk) 09:51, 16 March 2014 (UTC)[reply]
  • At least Njardarlogar knows what he is talking about, being a native Norwegian. There are two officially recognised languages - Bokmål and Nynorsk; Riksmål isn't, and I don't know anything about Høgnorsk. Although a lot of words are similar, and believe it or not an egg is et egg (Bokmål) or eit egg (Nynorsk), there are also many words which differ fundamentally; compare ukrainer with ukrainar, eier with eigar, and kirke with kyrkje, to quote just a few. I suggest comparing a Nynorsk text with one in Bokmål (Wikipedia is a good source), it is quite obvious that the two languages are sufficiently different to be classed as languages in their own right.

I have been slowly sorting out the mess that exists at present, splitting between Bokmål and Nynorsk where possible, entering missing inflections, replacing inflection tables where they are incomplete or erroneous - I am less than enamoured with inflection tables anyway and prefer a more "in your face" presentation of inflections. But it's a slow job, and I have only tackled nouns so far. But my ideal would be to scrap the "Norwegian" (no) heading entirely, leaving us with just Bokmål and Nynorsk. I would hate to see the work I have done so far undone by a whim. It would be very disheartening, and could lead to my ceasing to contribute. Donnanz (talk) 13:32, 16 March 2014 (UTC)[reply]

Re: kirke and kyrkje and , etc. Despite orthographic and grammar differences, Bokmål and Nynorsk and can still have the same L2 - "Norwegian". Having {{context|Nynorsk|lang=no}} and {{context|Bokmål|lang=no}} would add them to Category:Norwegian Bokmål language and Category:Norwegian Nynorsk language but SoP categories without such a distinction. E.g. kyrkje:
==Norwegian==

===Alternative forms===
* {{l|no|kyrkja}}
* {{l|no|kirke}} {{qualifier|Bokmål}}

===Etymology===
From {{etyl|non|nn}} {{term|kirkja|lang=non}}.

===Noun===
{{no-noun-f2|kyrkj}}

#  {{context|Nynorsk|lang=no}} [[church]]

====References====
* {{R:Dokumentasjonsprosjektet|lang=no}}
</>

And kirke:

==Norwegian==
[[Image:Arneberg kirke.jpg|thumb|kirke]]

===Pronunciation===
* {{IPA|/çɪrkə/|lang=no}}
* {{rhymes|ɪrkə|lang=no}}

===Alternative forms===
{{l-nn|kyrkje}} {{qualifier|Nynorsk}}

===Noun===
{{head|no|noun|g=m|g2=f}}

# {{context|Bokmål|lang=no}} a [[church]] (''a house of worship'')

====Inflection====
{{no-noun-infl|nb-class=f1|nb-class2=m1|stem=kirk}}
</>

--Anatoli (обсудить/вклад) 22:39, 16 March 2014 (UTC)[reply]

I don't think we should use context labels to specify the variety. Context labels are for sense-specific things, but the Bokmål-Nynorsk distinction is headword-specific. So it would be better placed on the headword line, like the Norwegian Wiktionary does. —CodeCat 22:13, 17 March 2014 (UTC)[reply]
But that's a wider problem. We already use context labels for that in all other languages that have this issue. --WikiTiki89 22:23, 17 March 2014 (UTC)[reply]
Which others are those? —CodeCat 22:26, 17 March 2014 (UTC)[reply]
English, Russian, German, e.g.: {{context|UK|Ireland|India|Pakistan|lang=en}}, which add to appropriate categories. We could use different headwords to allow difference in grammar but if we are to unify Norwegian, it's better to stick to "no", not "nb" and "nn" naming convention. --Anatoli (обсудить/вклад) 22:35, 17 March 2014 (UTC)[reply]
  • after e/c... Heck, I thought that was the whole point of the discussion about Chinese entries -- replacing multiple lang headings that necessitate lots of duped content with a single lang heading and multiple context tags to clarify ... well, to clarify the context, in this case, which Chinese language/dialect. Perhaps I misunderstood? ‑‑ Eiríkr Útlendi │ Tala við mig 22:41, 17 March 2014 (UTC)[reply]
@CodeCat Not all meanings are shared between the two standards. Even if two words ultimately have the same origin, intermediate steps in their etymology can be different. That's why I brought up convergent evolution earlier: even though Bokmål and Nynorsk may appear very similar on the surface, their inner workings can be pretty different. --Njardarlogar (talk) 22:44, 17 March 2014 (UTC)[reply]
All differences, including etymological can be handled in a unified Norwegian approach. Can they not? --Anatoli (обсудить/вклад) 23:11, 17 March 2014 (UTC)[reply]
Good luck doing it without context tags. And as I already mentioned, there are meanings that are considered dialectal in one standard and completely normal in another. How to handle that? Context tags? Usage notes? --Njardarlogar (talk) 23:18, 17 March 2014 (UTC)[reply]
There is always a way (if there's a will), e.g. {{context|Bokmål|regional Nynorsk|lang=no}} The exact context name can be decided and labels/categories created. No linguistic Bokmål or Nynorsk information should and will be lost. Having one L2 header will just make it easier to create new entries in any Norwegian variety or neutral (not applicable to any). Semantics is the easy part. Perhaps grammar should more of a concern to you but I think this can be solved as well. --Anatoli (обсудить/вклад) 23:24, 17 March 2014 (UTC)[reply]
Another detail. If Nynorsk and Bokmål words are to be considered variants of each other rather than belonging to separate languages, we'd have to provide usage examples on the wrong entry, which would be a bit weird (e.g. having a usage example of øye on auga or vice versa). --Njardarlogar (talk) 11:52, 18 March 2014 (UTC)[reply]
We'd have to do that? Why? As far as I know, we don't have to do that for other languages, nor are we in the habit of doing it. Cyrillic-script Serbo-Croatian entries have Cyrillic-script usexes, Latin-script entries have Latin-script usexes, don't they? English spellings like grey have quotations and usexes that use grey, while spellings like gray have quotations and usexes that usex gray. Sometimes lemma entries have quotations from all forms of a word, e.g. wight includes a quotation that actually uses the obsolete spelling wyght, but users have said in past discussions that they felt that was good because the lemma should represent all the forms. In any case, I don't see how or why our treatment of Norwegian usexes under one ==Norwegian== header would or should differ from our treatment of Serbo-Croatian or English usexes. - -sche (discuss) 17:21, 18 March 2014 (UTC)[reply]
I don't see how or why our treatment of Norwegian usexes under one ==Norwegian== header would or should differ from our treatment of Serbo-Croatian or English usexes. Because Nynorsk and Bokmål are separate standards with different grammar, vocabulary, sentence structure etc. It's not adequate to give an example in only one of the two standards. And examples need to be attached to meanings. Your example at grey is not (even if it can be deduced which meaning is meant).
Either we duplicate the meanings in some way (such that the words are no longer treated simply as variants), or we keep the examples at the lemma page. Having references on the variant's page to meanings on the lemma page is unstable and does not strike me as a viable alternative.
Another issue is that Bokmål and Nynorsk can have two or more unique variants of a word, in which case it is not obvious where the examples of usage would be found; so there would be a need to duplicate the usage examples at up to at least 8 different entries (as an example, the Nynorsk word fliseleggja has 8 variants: fliseleggja, fliseleggje, fliselegga, fliselegge, flisleggja, flisleggje, flislegga and flislegge) --Njardarlogar (talk) 17:53, 18 March 2014 (UTC)[reply]
The vote on this matter has now opened: Wiktionary:Votes/pl-2014-03/Unified Norwegian. - -sche (discuss) 19:48, 20 March 2014 (UTC)[reply]

A new format for Chinese entries (multisyllables)[edit]

{{look}} At present, our Chinese (Mandarin mostly) entries are structured like this. This format has a number of drawbacks:

  1. It splits the Chinese heading into multiple headings, resulting in duplication of information, since the written form of Chinese is shared across Chinese varieties. It does not take into account the fact that simp-trad correspondences are shared by all Chinese varieties (hence hanzi-box is duplicated for every variety), the semantics is more than 99% of the time the same across dialects, and the etymology is essentially shared across dialects as well.
  2. It duplicates simp-trad conversions in hanzi-box and every headword template.
  3. It duplicates pronunciation in every headword template and in the Pronunciation section.
  4. It duplicates etymology in the hanzi-box and in the Etymology section.

As a consequence, the Chinese-language presence here has been overwhelmingly "Mandarin", which is the basis of Written Chinese, and limiting the growth of other varieties (cf. nouns - Mandarin 20467, Cantonese 317, Wu 10). Hence a change in the format is much needed. I have created User:Wyang/歷史 and some new templates (see below). What do people think of this format? The code for an entry in such format would be (excluding the temporary userspace markers since the templates do not exist yet):

==Chinese==
{{wikipedia|lang=zh}}
===Etymology===
{{zh-forms|s=历史|to go through,<br>to experience|history,<br>records}}

First attested in ''{{w|Sanguozhi}}'', meaning "records of past events".

===Pronunciation===
{{zh-pron
|m=lìshǐ
|c=lik6 si2
|mn=le̍k-sú
|w=5liq sr
|ma=y|ca=y|wa=y
}}

===Noun===
{{zh-noun|hsk=b}}

# {{cx|obsolete|lang=zh}} [[record]]s of past events; [[historical]] records
# [[history]], [[past]]
# past [[experience]]s of a person, the history of a person
# ({{zh-l|歷史學}}) [[historiography]], the [[study]] of history

===See also===
* Synonyms: {{zh-l|過去|past}}, {{zh-l|以往|past}}
* Antonyms: {{zh-l|將來|future}}, {{zh-l|未來|future}}
* Derived terms: {{zh-l|歷史學|historiography}}, {{zh-l|歷史劇|historical play}}, {{zh-l|歷史觀|historical view}}, {{zh-l|歷史性|historic; of historic significance}}
* {{Sinoxenic-word|歷史|s=歴史|れきし|rekishi|역사|yeoksa|lịch sử}}

which I think is much more succinct than if all these varieties are created in separate headings.

Thanks for any feedback and input on this in advance. Wyang (talk) 08:37, 12 March 2014 (UTC)[reply]

Personally, I like the idea. --WikiTiki89 08:43, 12 March 2014 (UTC)[reply]
I like it too. Since, as you, @Wyang mentioned, the presence here has been overwhelmingly "Mandarin", there is little concern about missing user examples in Cantonese, Min Nan, etc (such as examples of vernacular written Cantonese or Min Nan) but it's still possible. I suggest to use {{cx|Cantonese|lang=zh}} whenever we have a specific variety entry, which should add to a corresponding subcategory.
Categorising: I think specific terms, not used in standard Chinese (Mandarin), could be added to existing categories, such as Category:Cantonese nouns, which should in turn belong to Category:Chinese nouns.
This is a big change, so it most likely require a vote, get ready for opposition. Note that we just had a vote allowing Cantonese Jyutping, which was a bit controversial, IMHO - not clear if only monosyllabic or polysyllabic entries are allowed. --Anatoli (обсудить/вклад) 10:37, 12 March 2014 (UTC)[reply]
Re: your example entry: User:Wyang/歷史:
Every foreign script entry has transliteration in the header, apart from Lao and Burmese, which have separate transliteration tables. Since Hanyu pinyin is the standard transliteration system for Mandarin, in my opinion, each PoS header should also repeat pinyin, like this: 歷史 (lìshǐ) (even if they are multiple). I'm OK to move simp./trad. differences to the Hanzi section.
Entries should be sorted by numbered pinyin (as agreed previously), so 歷史 should be sorted by li4shi3 and appear under letter "L", not under character 歷 or radical 止. The simplified equivalent 历史 should be sorted the same way, as the current entry 历史 (lìshǐ). We now have the functionality to convert toned pinyin to numbered pinyin, please consider adding this functionality for sorting.
Toned pinyin was specifically allowed by a vote and they are useful to locate Hanzi entries. Pinyin entries can also be generated in an accelerated method from entries. This functionality should also be available in the long run, IMO. --Anatoli (обсудить/вклад) 10:50, 12 March 2014 (UTC)[reply]
Thanks.
1. We can add additional parameters to the pronunciation template, since string parsing is now achievable. We could, for example, use
{{zh-pron
|m=lìshǐ
|c=lik6 si2
|mn=le̍k-sú
|w=5liq sr
|ma=y|ca=y|wa=y
|cat=n,v,a,history!*,insects
}}
, which would add the page to the following categories: 1) "Chinese nouns", "Chinese verbs", "Chinese adjectives", "zh:History|*", "zh:Insects"; or 2) "Mandarin nouns", "Cantonese nouns", ... depending on what everyone prefers. Each of these categories would be sorted by the respective sort key, eg. Pinyin for "Category:Mandarin nouns". Regardless, one can always call variety-specific categories, using codes like
{{cx|Cantonese|Wu|lang=zh}}
in the definition.
2. Regarding headword templates - I think we can generalise it to something like Template:zh-pos, since Chinese is uninflecting and there is no point in using a different base template for each PoS. I don't think the absence of a transliteration in the template is a problem. The requirement of transliterations is to guide pronunciation. Here the Pronunciation section has made romanisations and IPA pronunciations in various varieties sufficiently clear to readers, and I think we can use the precedent of Burmese entries (which also involves multiple transliterations) in this case.
3. Regarding Pinyin: Pinyin in the pronunciation template can be made clickable. And when acceleration creation is enabled in the preferences, one can click on the uncreated Pinyin in trad-form entries to create it. Trad-to-simp conversion can be performed hardly with any errors. Wyang (talk) 11:31, 12 March 2014 (UTC)[reply]
Re: There is no point in using a different base template for each PoS. But that's how all all languages are structured here, including non-inflected ones. It allows categorisation and other things, e.g. for nouns, you can add optional measure words (classifiers). If they are simple, it just makes it easy when recreating each headword for each SoP.
The suggested pronunciation section is quite big and may be even overwhelming to some users (I don't mean it's bad). What if more dialects are added? Having a simple pinyin looks much neater and that's what users see in published dictionary. Just standard pinyin in brackets, next to Hanzi, no other schema. Please consider. Burmese has various standards but I'd prefer that they've selected at least one to use in the headword, same with Lao. Just my opinion but that's also the established practice with current Mandarin entries. (Note that Burmese and Lao, currently a bit neglected have been maintained by one user each - Angr and Widsith but we have a bunch of editors with Chinese now). --Anatoli (обсудить/вклад) 11:51, 12 March 2014 (UTC)[reply]
Further feeback: See also, Synonyms, etc. should follow the standard format. Undecided about "Sinoxenic descendants" yet but we usually use "Derived terms". --Anatoli (обсудить/вклад) 12:00, 12 March 2014 (UTC)[reply]
Actually, I was wrong about Burmese and Lao. I've just checked. They do have transliteration. Not sure why I thought so, sorry. Thus, each non-Roman script entry has transliteration in the header (if correctly formatted). --Anatoli (обсудить/вклад) 12:10, 12 March 2014 (UTC)[reply]
Sure, I still mean using
{{zh-noun|mw=根|hsk=b}}
in the headword, but make templates like Template:zh-noun call Template:zh-pos or some module, instead of making them call Template:head separately like Template:cmn-noun etc. currently do. -pos templates also exist for some other languages, and in fact I got this idea from Template:ja-pos. (Others: ko-pos, tt-pos, oj-pos)
If we convert existing entries to this format, the vast majority will just have one variety in the Pronunciation section, i.e. Pinyin. The example I created was more of an extreme case - I doubt that section will be heavily populated by readings. Four (Mandarin, Cantonese, Min Nan, Wu) probably represents the limit. For other varieties, there is simply a lack of comprehensive dictionaries (not dialectal word dictionaries) like those dedicated to these four.
Regarding "see also" content - Those are not part of the change. The preference for clustered "see also" terms is more of a personal habit - I find it a little easier to locate content when various things are grouped by relatedness.
Burmese headword templates don't have transliterations (လောက). The transliterations are handled by Template:my-roman, which is kind of similar to what the proposed template "Template:zh-pron" would do. Wyang (talk) 12:17, 12 March 2014 (UTC)[reply]
Parameterised Template:zh-pos is not a bad idea. I didn't mean it must be only Template:zh-noun, etc.
Burmese and Lao entries are inconsistent, apparently. Some do and some don't have transliterations. My preference as a user and as an editor to have pinyin in the header, even if pronunciation sections are smaller.
If Chinese entries lose "rs" value, it would be a great change. Showing trad./simp. only Hanzi box is also a great change. No need to do numbered pinyin is also good. --Anatoli (обсудить/вклад) 12:37, 12 March 2014 (UTC)[reply]
OK, thanks, I see your point about the header. I agree with the rest. Wyang (talk) 22:44, 12 March 2014 (UTC)[reply]
Moreover, I'd like to introduce Zhuyin into the header, automatically converted from Pinyin, e.g. Pinyin: lìshǐ, Zhuyin: ㄌㄧˋ ㄕˇ. Pity you don't like the idea but Japanese and Korean entries do have transliterations, so does the overwhelming majority of other non-Roman languages.
I think the pronunciation section could and should look simpler (and smaller) and topolects should be added vertically, not horizontally, similar to how US/UK English sections are organised, e.g. man. That way, it won't matter how many regional pronunciations are added. I'm sure there are researches into many dialects. They many lack transliterations or sound recording but IPA could be optionally obtained and added to the bottom. --Anatoli (обсудить/вклад) 23:12, 12 March 2014 (UTC)[reply]
I have added Zhuyin. I am not sure about the vertical arrangement.. This is what it would look like with the tables now. To me it looks not as aesthetically pleasing as the horizontal one, since the four stacking tables are limited in width. Do you mean using the default list format of Wikimedia and getting rid of the tables? But again, the arrangement at man#Pronunciation to me looks like a mess. I prefer a side-by-side tabular format.
The Shanghainese one was already running out of romanisations a bit - it is using "WT romanisation", something I created. IPA is even harder for people to become familiar with. I doubt there will be people wishing to add other pronunciations. Wyang (talk) 00:01, 13 March 2014 (UTC)[reply]
Thanks! Although it would be great if the header had 歷史 Pinyin: lìshǐ, Zhuyin: ㄌㄧˋ ㄕˇ :) Maybe yes, doing without a table, something like this but with proper bolding, linking, etc:
Mandarin (Standard Chinese, Beijing)
Pinyin: lìshǐ
Zhuyin: ㄌㄧˋ ㄕˇ
IPA (key) /li⁵¹ ʂʐ̩²¹⁴⁻²¹⁽⁴⁾/
Cantonese (Standard Cantonese, Guangzhou)
...
Min Nan (Taiwanese)
...
--Anatoli (обсудить/вклад) 00:32, 13 March 2014 (UTC)[reply]
I see three differences between the proposed format and our current format.
  1. Some templates are changed. For the most part, the changes look like improvements, but proposed/new zh-pron needs more work: the proposed horizontal arrangement is so wide that smartphones and computers with small screens will have trouble with it, but the proposed vertical arrangement pushes the actual content (definitions) several screens down. (And if additional dialects are added, it will become even more unwieldy.) A large part of its bulk comes from its tabular format. Reworking it to use our usual bulleted-list format would seem to solve these problems.
  2. The proposal lumps Synonyms, Antonyms, Derived terms, Descendants etc into the ===See also=== section. I agree with Anatoli that we should not do this. There is no reason for our treatment of Chinese antonyms to differ from our treatment of Latin ones. The proposed format would become particularly untenable whenever a word had a large number of synonyms, antonyms, and/or derived terms, like some of the words for tea probably do.
  3. The proposal groups the various Chinese languages under one header. This is the bit that may be most controversial. As spoken, the varieties are often not mutually intelligible... yet (as has been noted) they are written the same way, and Wiktionary is a written dictionary.
- -sche (discuss) 00:51, 13 March 2014 (UTC)[reply]
Thanks. Wyang has responded to your second point, it's not part of a change, so, it should probably be removed from the example page to avoid confusion. I like your bulleted example. Most entries won't have such a big variety, since we don't have editors and knowledge of dialects. The existing ones should be imported.
On the main issue of merging all topolects into one L2 "Chinese" there has been surprisingly few comments. I'm sure there will be resistance, if there is a vote. Yes, all written forms can be accommodated into one Chinese section, despite them being not mutually intelligible (when spoken out loud!). Perhaps THIS problem should be solved first. There are other important improvements. If we ignore the pronunciation section (partially automated), entries will become easier to create. The main difficulty and source of errors for beginners was "rs" (radical sort) value, which stopped even native speakers from adding contents. --Anatoli (обсудить/вклад) 01:25, 13 March 2014 (UTC)[reply]
Thanks. Format of the pronunciation template isn't an issue; it can be easily changed when all pages link to it, whether it be a table or a list. To me the list form is not as good-looking, but it is much easier to change to. I am not sure what was meant by the unresolved problem with the merger - like said above, variety-specific labels can be applied when necessary. Looking at Category:Cantonese nouns and Category:Min Nan nouns in traditional script, most currently-existing entries in these varieties are not dialectal words. There isn't a huge number of currently-existing entries for other varieties (something less than 2000) compared with Mandarin (around 35000), and these can be manually examined one-by-one. Wyang (talk) 04:17, 13 March 2014 (UTC)[reply]
The issue is not only technical but political. I know that most existing non-Mandarin entries only differ in pronunciation. The existing written dialectal forms not included here are small in number, even if they may be frequently used. I meant that in the past, after a seeming agreement in the BP, there was a vehement opposition to certain changes. Some editors are currently away or ignoring this page or, this change may create other issues later. If we have a successful vote, then there's no worry. Apart from Pinyin/Zhuiyn missing in the header (which I consider important and it matches the majority of our entries in non-Roman script) and minor formatting (please consider -sche's suggested format or similar) I personally have no issue with your proposal. Calling on other Chinese-aware editors: @Tooironic, @Jamesjiao, @Kc_kennylau. --Anatoli (обсудить/вклад) 05:08, 13 March 2014 (UTC)[reply]
(Cantonese native here) @Atitarev Thanks for notifying me. I support this proposal. By the way, this is a matter of merging different dialects of Chinese, so I think that WT:RFM should be notified for this matter. {{context}} should also be modified so that it can show dialects, as I am aware that 告白 means a declaration of love in other dialects but advertisement (especially on television) in Cantonese. I support this proposal because I often see lack of definition in Cantonese entries while the definition is complete, precise and concise in their Mandarin section, and the definition in Cantonese is often the same as the definition in Mandarin. Still have to look out for dialectal differences, though. As a non-native speaker of English, I apologize for any misunderstanding that I have made. Here is what I mean for dialectal difference:
==Chinese==
[pronunciation, etymology and rest of the stuff here]

===Noun===
{{zh-noun|[stuff goes here]}}

# {{context|Mandarin|Min Nan|[other dialects]|lang=zh}} a [[declaration]] of love, a [[confession]] of one's feelings towards someone
# {{context|Cantonese|lang=zh}} an advertisement broadcast on television

===Verb===
{{zh-verb|[stuff goes here]}}

# {{context|Mandarin|Min Nan|[other dialects]|lang=zh}} to [[declare]] love, to [[confess]] one's feelings towards someone
--kc_kennylau (talk) 09:03, 13 March 2014 (UTC)[reply]
Thanks. I agree with you. BTW, apart from Cantonese, 告白 also means "advertisement" in some Mandarin dialects: eg. Changli, Hebei (Jilu Mandarin) - /kau55-43 pai13/; Ganyu, Jiangsu (Zhongyuan Mandarin) - /kau51-31 pei55/, and possibly in Taiwanese: [2]. Also I think the sense of "declaration of love" is used in Cantonese too (google:告白嘅). So the page can go something like
===Noun===
{{zh-noun|xxx}}

# [[announcement]], [[public]] [[announcement]]
# [[expression]] of one's thoughts; especially, [[declaration]] of love, [[confession]] of one's feelings towards someone
# {{cx|Cantonese|dialectal Mandarin|Min Nan|lang=zh}} [[advertisement]], [[ad]]

===Verb===
{{zh-verb|xxx}}

# to [[pronounce]]; to [[express]] oneself; especially, to [[declare]] love, to [[confess]] one's feelings towards someone

Wyang (talk) 12:18, 13 March 2014 (UTC)[reply]

This is a huge change. How will the existing entries be dealt? What about categories? We haven't yet resolved the issue with [*Category* in simplified/traditional script] categories. Don't get me wrong. I like where this is going. I am fully capable of creating entries in Wu, which is my native dialect, but it has been the amount of duplication that I have to deal with that puts me off creating any entries for it, so this change would be a big step forward. It will need to be carefully managed however. JamesjiaoTC 01:55, 14 March 2014 (UTC)[reply]
Yes it is a huge change, which is why it should not be made lightly. We have made such radical changes before, such as with merging Serbo-Croatian, which took several years before the last entries were merged, but nevertheless it worked out in the end. --WikiTiki89 02:16, 14 March 2014 (UTC)[reply]
Yes, the main thing is to have an agreement about the change. The problem with duplications and lack of dialectal templates makes it harder, not easier to add contents for, say Wu. It's looking very positive but I think it's desirable to have a vote. It's a pity @A-cai is no longer active, he is a Min Nan native speaker and creator of many original Chinese templates. His view on dialects was opposite to Wyang's but he had a lot of valid points we should consider before merging. Can I throw in some counterarguments? What do we do with Dungan (Cyrillic), romanised Min Nan and Hui dialects (in Arabic script), current and future? --Anatoli (обсудить/вклад) 02:33, 14 March 2014 (UTC)[reply]
We could have entries such as Cyrillic/Arabic script form of ... that also contain all the lexical and dialectal information that is unique to the script. --WikiTiki89 02:41, 14 March 2014 (UTC)[reply]
Thanks. That sounds good. Any other opinions? I think these varieties could all go under Chinese L2 header as well. Do we need to change nesting for Mandarin translations? "Chinese:\Mandarin:" to just "Chinese:"? The dialects, if they get separate translation should continue to be nested, IMO. Please consider child#Translations or water#Translations (identical Sinitic translations without transliteration should be removed, IMHO, such as Gan: , Wu: ). --Anatoli (обсудить/вклад) 02:51, 14 March 2014 (UTC)[reply]
I think we should continue nesting, but only when the dialect differs in written form from the standard/common form. The standard/common form should be listed on the top level (i.e. after "Chinese:"). --WikiTiki89 02:59, 14 March 2014 (UTC)[reply]
That's what I meant about nesting but I think transliterated translations could be allowed (no strong opinion on this but users may complain about missing Cantonese, Min Nan translations, if they provide pronunciations). Instead of (@Beijing#Translations):
* Chinese:
*: Cantonese: {{t|yue|北京|tr=bak1 ging1}}
*: Mandarin: {{t+|cmn|北京|tr=Běijīng|sc=Hani}}
*: Min Nan: {{t+|nan|北京|tr=Pak-kiaⁿ|sc=Hans}}
We could have:
* Chinese: {{t+|zh|北京|tr=Běijīng|sc=Hani}}
*: Cantonese: {{t|yue|北京|tr=bak1 ging1}}
*: Min Nan: {{t+|nan|北京|tr=Pak-kiaⁿ|sc=Hans}}
With a language code "zh", not "cmn". --Anatoli (обсудить/вклад) 03:24, 14 March 2014 (UTC)[reply]
  • I'm going to sit out of this one. Although I'm one of the regular Mandarin editors here, my technical knowledge is close to zero, so I wouldn't be able to contribute much. However I do support the lumping of all Chinese languages under Chinese in theory. As the sandbox page currently stands, I don't like how the pronunciation boxes are so massive and keep pushing out to the right, it causes the whole page of my browser to elongate. There must be some way to condense this. I agree also that there should be separate headers for See also, Synonyms, etc. just like all the other languages' entries. Anyway, you guys are right - this is a huge proposal for change. Wyang has already contributed so much, I just hope that the transition can be relatively smooth - there are many other issues that need to be worked out first. ---> Tooironic (talk) 11:18, 14 March 2014 (UTC)[reply]
I like the unformatted list but in a collapsible format which would collapse/open the entire pronunciation section with one click. --Panda10 (talk) 17:47, 14 March 2014 (UTC)[reply]
Do you mean something like the format that is used on pecan? - -sche (discuss) 21:13, 14 March 2014 (UTC)[reply]
Yes, the pecan-style collapsing is fine. Otherwise, the pronunciation section will take up a lot of vertical space and the users will have to scroll a lot. --Panda10 (talk) 21:54, 14 March 2014 (UTC)[reply]
I like the unformatted list. It matches the format used in other entries in other languages. I have no strong feelings for or against collapsing it under a pecan-style {{rel-top}} the way Panda seems to propose. If one insists upon a tabular format, I think this is the best one (much less bulky than the other tables, while still showing the IPA and audio files, which the collapsed table hides). - -sche (discuss) 21:13, 14 March 2014 (UTC)[reply]
I see, thank you both. Other opinions? Wyang (talk) 03:52, 15 March 2014 (UTC)[reply]
Good efforts, Wyang. I like the unformatted list as well. Perhaps, it should be collapsible but only if there are more than two(?) topolects. The overwhelming majority of entries just have Mandarin, so it should be visible (expanded) by default, in my opinion. My second choice is "Collapsed blocks". A similar style is also used in Wikipedia when there are multiple languages involved. What are the opinions about transliterations in the header? --Anatoli (обсудить/вклад) 04:20, 15 March 2014 (UTC)[reply]
Thanks. How about this if there are more than two(?) topolects, and Unformatted list if there are less than three? Wyang (talk) 21:04, 15 March 2014 (UTC)[reply]
Looks great. I would add Zhuyin after comma for Mandarin. Pinyin is no longer linked. It's okey with me but this doesn't allow for accelerated Pinyin entries. --Anatoli (обсудить/вклад) 23:13, 15 March 2014 (UTC)[reply]
Pinyin is now linked. I would probably not worry about Zhuyin, since it is now unofficial in Taiwan. Wyang (talk) 23:12, 16 March 2014 (UTC)[reply]
Thank you. The role of Pinyin and Zhuyin is a bit different in Taiwan. Fortunately, standard Pinyin is now standard in Taiwan and used for Romanisation of Chinese characters as well but Zhuyin is used in education, to teach pronunciation at elementary schools and in dictionaries. The fact that children's book in Taiwan are published with Zhuyin, not Pinyin, makes it more important for foreigners as well, wishing to study the "Taiwanese way" or using Taiwanese resources. For educating foreigners both Zhuyin and Pinyin are used now. --Anatoli (обсудить/вклад) 23:39, 16 March 2014 (UTC)[reply]
OK... Zhuyin added now (User:Wyang/歷史). Wyang (talk) 02:26, 17 March 2014 (UTC)[reply]
I like how the pronunciation section is looking (unformatted, collapsible). I do have a question regarding the new etymology/hanzibox section. How would you deal with entries that have two (or possibly more) alternative traditional forms, such as 僥幸? JamesjiaoTC 21:01, 17 March 2014 (UTC)[reply]
@Jamesjiao That's {{zh-hanzi-box}}, which should be reused, IMHO, which can handle alternatives (I would use Chinese comma , instead of the word "or". I also don't think the etymology of individual characters (above characters) is a sustainable feature (User:Wyang/歷史). --Anatoli (обсудить/вклад) 22:51, 17 March 2014 (UTC)[reply]
How about User:Wyang/僥幸? Wyang (talk) 04:30, 18 March 2014 (UTC)[reply]
May I request instant expand/collapse rather than the super slow sliding one? --WikiTiki89 21:48, 17 March 2014 (UTC)[reply]
@Wikitiki89 Which one is which? --Anatoli (обсудить/вклад) 22:51, 17 March 2014 (UTC)[reply]
I'm talking about the expand/collapse in the pronunciation box. Currently (at least for me), it slides in and out (it's not too slow, I was exaggerating above), but I think it should just expand and collapse instantly like our other expand/collapse boxes (translation tables, inflection tables, etc.). --WikiTiki89 04:12, 18 March 2014 (UTC)[reply]
I don't know whether it is possible to change it to a navbox-based template without significantly altering the appearance (to make it look like the one in pecan)... I seem to like this collaspsible layout better, despite the slower speed. If people prefer that layout, I can do that change. Wyang (talk) 04:30, 18 March 2014 (UTC)[reply]
There has to be some way to change the expand/collapse effect without affecting the layout. --WikiTiki89 04:51, 18 March 2014 (UTC)[reply]
Agreed, though not to my knowledge... Anyway, someone familiar with collapsible tables can change the appearance of the template later on. I would like to hear more feedback on the idea and potential effects of the changes. Wyang (talk) 03:37, 19 March 2014 (UTC)[reply]
Are you asking about homophones? You seem to be able to generate a list of homophones to add to the template (by default - no homophones). Perhaps you could allow templates to have parameters for homophones? My concern is that your suggested handling of homophones seems too complicated for an average editor with few technical skills. --Anatoli (обсудить/вклад) 03:42, 19 March 2014 (UTC)[reply]
Just asking about feedback on the topic of this discussion. (WRT homophones: I think keeping the information centralised is probably best. Using a homophone parameter in templates wouldn't be as efficient (eg. one would have to update 14 pages if one wants to add another homophone of 意義. It is not technically difficult to edit the information. Just edit the page Template:Pinyin-IPA/hom/yìyì which is in "Templates used on this page" in the edit page. This discussion belongs at User talk:Wyang#Putting a homophone field in the pronunciation header template) Wyang (talk) 04:00, 19 March 2014 (UTC)[reply]
We should invite @Liliana-60 to ask for permission to allow "zh" language code, ==Chinese== to be recreated and review what else needs to be done for conversion, as there seems to be a general agreement about merging all Chinese varieties. Dungan, romanised Min Nan, Xiao'erjing, nested translations will need to be addressed as well. --Anatoli (обсудить/вклад) 04:10, 19 March 2014 (UTC)[reply]

Can people please comment on the next steps in making it happen (unified Chinese, zh, ==Chinese==, all said above) and express opposition, if any? Some help would be appreciated. --Anatoli (обсудить/вклад) 00:29, 21 March 2014 (UTC)[reply]

I have just created a template for the vote: Wiktionary:Votes/pl-2014-04/Unified Chinese for your convenience. @Wyang, @Jamesjiao, @Tooironic, @Hahahaha哈, @Kc kennylau, @Bumm13. --Anatoli (обсудить/вклад) 05:10, 21 March 2014 (UTC)[reply]
As I'm not a native speaker of any Chinese variety, it's not a big deal to me if the varieties (technically they're not dialects as the varieties such as Cantonese, Min Nan, etc. have their own local dialects) get merged under a "Chinese" heading. That said, it's going to have to be done right and is going to be a huge task. Bumm13 (talk) 05:52, 21 March 2014 (UTC)[reply]
One thing I think is important is that we preserve parameters for specific romanization methods (Pinyin and Wade-Giles for Mandarin, Jyutping and Yale for Cantonese, etc.) used in the {{t+|abc}} templates or whatever we would end up using as that helps provide clear information about the readings used for our CJKV word entries. Bumm13 (talk) 06:05, 21 March 2014 (UTC)[reply]
Thanks Anatoli. With translation tables, the only change would be: one can add a translation directly after the * Chinese field after the change, without having to specify a variety. The varieties nested underneath still exist, with appropriate romanisations, even though the characters might be identical. For character entries, the romanisations for varieties will be put into one template in the pronunciation section (the same as the one in User:Wyang/歷史), with different readings separated by commas. Automatic conversion should be utilised to maximum extent. There are now a number of scripts available for various transliteration or pronunciation conversions, including Pinyin-IPA (and other Pinyin analyses, eg. tone markers to tone numbers, adding syllable spacings), Pinyin-Zhuyin, Zhuyin-Pinyin, Jyutping-IPA, wuu-IPA and a fairly crude Min Nan pronunciation template (Module:nan-pron). I just wrote a script for converting Jyutping to Yale for Cantonese (see Template:Jyutping-IPA), and will be working on Min Nan tomorrow or later. Wyang (talk) 13:24, 22 March 2014 (UTC)[reply]
I have finished the work on the Min Nan pronunciation template Module:nan-pron; please see Template:Min Nan-pron for examples. All subcomponents of {{zh-pron}} are now done. Wyang (talk) 06:18, 25 March 2014 (UTC)[reply]

OK. Thanks for all the comments above. It appears no one is in opposition to this proposal, and that our active editors of Chinese (Mandarin, Cantonese, Wu, etc.) entries are in overwhelming support for the move. The pronunciation template {{zh-pron}} is now ready as well. The vote Wiktionary:Votes/pl-2014-04/Unified Chinese will be open in two days - please express your opinion there. Thanks all. Wyang (talk) 01:52, 27 March 2014 (UTC)[reply]

The vote Wiktionary:Votes/pl-2014-04/Unified Chinese has started. --Anatoli (обсудить/вклад) 22:33, 30 March 2014 (UTC)[reply]

Thanks admins[edit]

Thanks to all the admins who didn't block me for violate the bot policy. Obviously, I've been running a bot using my own username for the last month or so. However, now "all the work" is done, so I'm essentially retiring from botting in Asturian. As is my trademark, I'll leave with a tiny batch of vandalism, and will see you all soon under a new username. WF --Back on the list (talk) 18:52, 13 March 2014 (UTC)[reply]

I have deleted all the fuckmyasses, fuckmyassáremos pages which were added. - -sche (discuss) 19:30, 13 March 2014 (UTC)[reply]
Good thing we didn't let him convince us to make him an admin again. --WikiTiki89 19:52, 13 March 2014 (UTC)[reply]

Proposed optional changes to Terms of Use amendment[edit]

Hello all, in response to some community comments in the discussion on the amendment to the Terms of Use on undisclosed paid editing, we have prepared two optional changes. Please read about these optional changes on Meta wiki and share your comments. If you can (and this is a non english project), please translate this announcement. Thanks! Slaporte (WMF) 21:55, 13 March 2014 (UTC)[reply]

Linked terms in black[edit]

Why are some term linked to from entries appearing in black and others in blue? See Xylopia for which the first species is black and linkable and the others blue and linkable for me. DCDuring TALK 00:04, 15 March 2014 (UTC)[reply]

I don't see it. Do you notice this problem anywhere else? DTLHS (talk) 00:14, 15 March 2014 (UTC)[reply]
And what CSS class is around the link? DTLHS (talk) 00:15, 15 March 2014 (UTC)[reply]
I notice it in many places, but this one has: class="mediawiki ltr sitedir-ltr ns-0 ns-subject page-Xylopia_aethiopica skin-vector action-view vector-animateLayout" and a following element, with normal appearance has class="extiw". Another item with the black-seeming font is <a class="stub". DCDuring TALK 01:15, 15 March 2014 (UTC)[reply]

{{audio}} to categorize entries?[edit]

According to this Template_talk:audio#Categorisation this has been brought up (or at least was meant to?). I think terms with audio links by language categories are very important. Right now all of the members in them are by "hard coding" (actually typing out the category in the entry's body) which I think is very unwieldy.

Lua (among other things) provides the functionality to compare the first x chars of a string to a set of values (correct me if I'm wrong). what could be the chances of using the language codes used in the names of all of pronunciation files to define a category "X terms with audio links" for that entry? E.g., de-Amerika.ogg --> de --> Category:German terms with audio links. Neitrāls vārds (talk) 05:11, 15 March 2014 (UTC)[reply]

We could also just add a lang= parameter to the template. That's how we do it with the others too. —CodeCat 14:21, 15 March 2014 (UTC)[reply]
Could you add that parameter? ...and then a bot could add the lang= parameter to existing audio templates based on the L2 header they are under? Realistically how complicated would it be to have a bot do this, could it be hooked up to some existing bot? I have some 800 files waiting on commons (I am not looking forward to actually having to add those, lol) but if I get around to doing that I could provisionally start specifying a lang=lv parameter (although it's not there yet.) Neitrāls vārds (talk) 04:01, 16 March 2014 (UTC)[reply]
It would be trivial to do if the template were to add entries lacking the parameter to Category:Language code missing/audio. But I'd prefer it if others gave their assent before changing the template. —CodeCat 04:07, 16 March 2014 (UTC)[reply]
Editors who are creating/adding pronunciation files would probably be the ones most interested in this proposal. User:Panda10 is one editor who I've noticed has been working on pronunciations. Perhaps there are more. Neitrāls vārds (talk) 21:24, 16 March 2014 (UTC)[reply]
I second the request of Neitrāls vārds and support adding a lang= parameter to the template. --Anatoli (обсудить/вклад) 22:19, 16 March 2014 (UTC)[reply]
I've also been thinking lately that adding lang= to the template is a good idea. @Neitrāls vārds Re "Editors who are creating/adding pronunciation files would probably be the ones most interested in this proposal": I disagree. I think that editors who would have been recording and adding audio if it were easier to do are the ones who benefit the most. --WikiTiki89 22:40, 16 March 2014 (UTC)[reply]
Thanks for your input. @Wikitiki, it's actually rather uncomplicated, imo, when I had the last go at it it came out to 600+ files (reading out to commons upload) in less than 6 hrs (ofc that might be overdoing it a little 'cause I started to seriously second-guess every file "what's up with the weird intonation, is this tone right," etc., etc.) Neitrāls vārds (talk) 22:50, 16 March 2014 (UTC)[reply]
Yeah, but for me, I feel no strong need to upload audio. There are times that I just want to add audio to one or two words, but I don't want to go through the trouble of downloading an audio recorder that supports OGG or whatever. It's more of a psychological barrier than a real one, and I expect the same is true for many editors. --WikiTiki89 00:22, 17 March 2014 (UTC)[reply]
This project to develop a gadget to allow people to record and upload files directly from Wiktionary pages could help with that. - -sche (discuss) 00:55, 17 March 2014 (UTC)[reply]
Hahaha, I thought this was that discussion! That's why I was saying this. I just realized that this is actually about audio templates. --WikiTiki89 02:48, 17 March 2014 (UTC)[reply]
I've added some code to add entries to Category:Language code missing/audio if the lang= parameter is not present. That would currently include all entries that use {{audio}}. The parameter itself doesn't do anything yet, I only made the change to give the category some time to fill up.
I did notice something else about the template though. It checks for the presence of the first parameter (the sound file name), and only actually displays anything if that is present. But the first parameter doesn't default to it being empty, so {{audio}} would display something, while {{audio|}} would display nothing. This seems rather strange, so I wonder if that could be removed? Really, the first parameter should always be required, so leaving it empty should be an error; it shouldn't just cause the template to show nothing silently. —CodeCat 01:09, 17 March 2014 (UTC)[reply]
The {{audio-IPA}} template already has the lang= parameter and it's functional. I checked a few entries in Category:Terms with audio links by language and they all used the {audio-IPA} instead of {audio}. So adding lang= to {audio} makes sense. While the 'terms with audio' category appears useful, my preference really is the index with immediate audio links, so users do not have to go to each entry to listen to the audio. As an example, see Index:Hungarian/d. Click the blue arrows and it will play the sound file. Unfortunately, the index is no longer refreshed. --Panda10 (talk) 13:06, 17 March 2014 (UTC)[reply]
I'm OK with the idea of adding lang= to audio and making it obligatory. But what about all the articles with audio files without the lang= parameter? Is someone going to run a bot on these to add it? (I'm thinking specifically of Latvian entries: would someone here be willing to run a bot adding lang=lv to the {{audio}} template in Latvian entries, in case it is not there yet?) --Pereru (talk) 00:21, 18 March 2014 (UTC)[reply]
Yes, that should be very easy to do by bot, since the language can be determined from the section. --WikiTiki89 04:13, 18 March 2014 (UTC)[reply]

So, could we add the param itself to the template? My take on it would be something like this (not sure about the invoke part though): {{#if: {{{lang}}} | [[Category:{{#invoke:languages/templates|lookup|{{{lang}}}|names}} terms with audio links]] | [[Category:Language code missing/audio]] }}

And then one could start making any red categories there are (which would probably pop up right to the top of Special:WantedCategories) as that would be done manually either way, I think. Neitrāls vārds (talk) 17:36, 25 March 2014 (UTC)[reply]

Ok, I've made the change. —CodeCat 20:24, 26 March 2014 (UTC)[reply]
Would you be able to analyse the template usage by the prefix and insert "lang=". It would be much less time-consuming to remove the wrong ones than to add this parameter on thousands of entries. --Anatoli (обсудить/вклад) 21:37, 26 March 2014 (UTC)[reply]
MewBot is already running on this. It's using the cleanup category I added earlier, and it determines what code to use based on the language section that the template appears in. So if it sees {{audio}} in a L2 section called "Russian", it converts that to "ru" and then adds "lang=ru". —CodeCat 21:51, 26 March 2014 (UTC)[reply]
I see, thanks. --Anatoli (обсудить/вклад) 21:56, 26 March 2014 (UTC)[reply]

Recent edits to US location entries[edit]

I've just spent close to an hour systematically undoing unhelpful edits User:Pass a Method made to entries on US states, cities, etc. These included listing North America and Western Hemisphere as holonyms. Why stop there? Why not pan out for the widest angle possible and include Earth, Solar System, Orion Arm, Milky Way, Local Group, Universe? And how is adding such detailed geographical information to entries beneficial from a linguistic perspective? -Cloudcuckoolander (talk) 22:59, 15 March 2014 (UTC)[reply]

Thank you very much. I undid some of them, but didn't have time to go through all of them. I was going to bring it up here, but we were already discussing it at the WT:Information desk#homonyms. Anyway, you seem to be the only person who actually did anything about it. --WikiTiki89 23:30, 15 March 2014 (UTC)[reply]
Thats an invalid comparison. I myself was unaware about the existence of some of these holonyms until a few years ago. Therefore I consider it to be educational since it offers one a varied vocabulary not only for others but myself too - hence i'm dissapointed with the reverts. Plus they deal with the ambiguity that is inherent in the language of some place names. You will notice that i usually ignore holonyms in matters unrelated to geography. This is because only in geography have such holonyms become of critical contemporary importance with intergovernmental organizations, multiregional international organizations, and supraorganizations being organized on the basis "detailed geographical information". I consider my addition to be akin to adding "Balkans" to the Bulgaria page. Nonetheless, i am willing to compromise and be less broad with a sense of proportion next time since that seems to be the main objection. Pass a Method (talk) 00:00, 16 March 2014 (UTC)[reply]
Words like Usonia are very rare in the real world. Most English speakers don't even know that such a word exists. Teaching people that Usonia is holonym of, say, New York is not education, but misinformation. --WikiTiki89 00:12, 16 March 2014 (UTC)[reply]
Wiktionary is a dictionary, not a popularity contest. The standard method for rarer words is to tag it with the word "rare". Pass a Method (talk) 00:28, 16 March 2014 (UTC)[reply]
And did you tag it with the word rare? Essentially the bigger problem is that you are adding these instead of adding words such as United States, which are much more useful and much more common. The question of whether we even need holonyms on these pages is also controversial, as this is already covered by our system of categories. --WikiTiki89 00:33, 16 March 2014 (UTC)[reply]
I was planning to. Pass a Method (talk) 00:47, 16 March 2014 (UTC)[reply]
Even if we agreed that rare words should be listed on pages like these, then you still need to add the common words first, not plan to later. But I think very few people would agree that Usonia belongs on a page like New York. --WikiTiki89 01:02, 16 March 2014 (UTC)[reply]
You may think everyone should know terms like Usonia- and they should. The way to accomplish that is by including them in the synonym sections for US, USA, America, etc. Wiktionary is a descriptive dictionary, so it follows usage.Your favored terms are simply not used in connection with the entries you added them to in any significant amount.
There are far too many -nyms applicable to most entries for us to include all of them, so there's some level of selection involved. However much you may wish to promote knowledge of those obscure words, there are also others that may want just as much to include similar references to Al-Qaeda and terrorist in every entry remotely connected with Islam. Allowing one person's personal taste to prevail over the whole of Wiktionary is a recipe for divisiveness, partisan ugliness and edit-warring when the inevitable backlash sets in. Neutral point of view is one of our fundamental principles for this very reason.
Before you start to claim that what you're trying to push is neutral, you should consider the difference between not taking sides and trying to equalize sides. If the reality is that the vast majority of usage is biased or even bigoted, we have to describe that reality as it is, not give extra weight to minority usage that opposes the biases. Chuck Entz (talk) 01:15, 16 March 2014 (UTC)[reply]


See, this is why I generally do not go on edit‐sprees unless somebody preapproves it. There was an Ossetian entry where I changed the scripts to avoid script‐mixing, but I only did one, then asked about it in the Beer Parlour before doing any more. Turns out that their script is supposed to be Slavic with a Latin letter.

Personally, I’d rather he or she be asked to revert his or her own modifications if people didn’t agree with them. Getting pissed off at somebody over work that was never obligatory for you is considerably unfair. --Æ&Œ (talk) 01:17, 16 March 2014 (UTC)[reply]

@Pass a Method — Wiktionary is a dictionary. It's not our place to try to function as a sort of text-based atlas. There's an argument for listing "United States" as a hypernym of each of the U. S. states. But listing "Western Hemisphere" and "North America" is excessive, and opens the door to the listing of even larger, farther-removed regions to which, say, Vermont belongs (e.g. Orion Arm). It's also redundant, because the definition of, say, Vermont should be written in such a way as to communicate the fact that it's located in the United States, and the entry of the United States to communicate that it's in North America, and so forth. -Cloudcuckoolander (talk) 01:56, 16 March 2014 (UTC)[reply]

Liaison in French[edit]

How would we express that "vous" would be pronounced /vuz/ when followed by a word beginning by a vowel or a mute h? Here are some of my suggestions:

  1. IPA(key): /vu(z)/
  2. IPA(key): /vu/, (in liaison) IPA(key): /vuz/
  3. IPA(key): /vu/, (before words beginning by a vowel or a mute h) IPA(key): /vuz/

--kc_kennylau (talk) 10:49, 16 March 2014 (UTC)[reply]

All three of those variants completely ignore the fact that the /z/ is pronounced in the first syllable of the next word. For example, vous avez is pronounced /vu.za.ve/. --WikiTiki89 12:47, 16 March 2014 (UTC)[reply]
@Wikitiki89 Please make your suggestion. --kc_kennylau (talk) 14:43, 16 March 2014 (UTC)[reply]
French liaisons follow rules: all words beginning with a vowel sound can make a liaison with the previous word if it ends with a consonant. It is something that should be learned in general, not detailed in every word, except for the words that begin with a h which may be either (des héros = /de e.ʁo/, not /de.z‿e.ʁo/ ; but des habitudes /de.z‿a.bi.tyd/ not /de a.bi.tyd/) and other exceptions. Dakdada (talk) 14:59, 16 March 2014 (UTC)[reply]
But vous does not end with a consonant, it's /vu/ as shown above. From a phonological point of view, the /z/ is unpredictable and just needs to be memorised. French grammar actually turns out to be vastly different from what we're accustomed to, if you take the pronunciation as the starting point. (Most nouns turn out to have no plural inflection anymore, for example; and for adjectives the feminine is actually the base form.) —CodeCat 15:17, 16 March 2014 (UTC)[reply]
I have always wondered if liaison in French would exist to the same extent if it weren't for the written language (i.e. spelling pronunciations). We do know that colloquial French has less liaison than formal French, and that Quebec French has less liaison than French French (but Quebecois are no less literate than the French). --WikiTiki89 18:21, 16 March 2014 (UTC)[reply]
Similar question for (American) English: but is pronounced /bət/ (let's say), but /bəɾ/ before a vowel or /h/. We don't mention that difference in the entry. Should we? Certainly not with slashes. Maybe with brackets, maybe not. But I don't know enough about French to intelligently comment on the analogue there.​—msh210 (talk) 17:12, 16 March 2014 (UTC)[reply]
The change that you mention is predictable. If you know the pronunciation of the word in isolation, then you can predict the lenition of the final consonant depending on the next word. But for French that doesn't work; you have to learn the liaison consonant separately for every word. You can't predict /vu.z/ from /vu/. —CodeCat 17:16, 16 March 2014 (UTC)[reply]
A better comparison would be a language with final devoicing, for which we indicate the devoicing in our transcriptions, but devoicing does not happen when followed by a word that starts with a vowel (let's ignore the case that the following word starts with a voiced consonant, because that could potentially voice previously unvoiced consonants). --WikiTiki89 18:21, 16 March 2014 (UTC)[reply]
That too, but I should note that not all languages with devoicing work that way. In Dutch for example it's actually morpheme-final devoicing, and it happens even to the first part of a compound. badwater (bath water) is [ˈbɑtˌʋaːtər] in normal speech. You rarely hear it pronounced with [d], but that depends on dialect and also implicitly reflects a kind of "blurring" of the morpheme boundary, and therefore of the syllable boundary. It's also relatively uncommon with stops, a bit more common with fricatives. —CodeCat 18:34, 16 March 2014 (UTC)[reply]
You can also consider the inflected forms. I'm sure baden is pronounced with a /d/, right? But that is also easy to account for as a voiced inflection paradigm and an unvoiced inflection paradigm. In standard Russian, prepositions and some particles are the only words that regularly retain their voicing before words that start with a vowel; for example над (nad), even though we transcribe it as /nat/ is often actually pronounced /nad/ even in isolation, as attested in the audio sample in our entry. --WikiTiki89 18:46, 16 March 2014 (UTC)[reply]
(In reply to CodeCat, 17:16, 16 March 2014 (UTC).) Is it predictable from the spelling, though, and therefore from the entry?​—msh210 (talk) 05:22, 17 March 2014 (UTC)[reply]
Usually but not always. For example, et (and) never causes liaison. While es (are) causes a /t/ rather than a /z/. --WikiTiki89 05:33, 17 March 2014 (UTC)[reply]
(in reply to msh210 17:12, 16 March 2014 (UTC)) This is a feature exhibited by some accents, in a given context /t/ would be realised as [t], [ɾ] or [ʔ] depending on the accent. But this is not phonetically relevant, as well as predictable. Besides, the liaison may be mandatory, forbidden or optional in a given context. Whether an optional liaison is made depends on the speaker, the level of speech –the more formal, the more liaisons are likely to be made–, and the speaker’s mood; it can even, along with elision and prosody, change the meaning, making the statement solemn rather than anodyne. So it’s way more complex than having allophones. – Pylade (talk) 11:28, 27 March 2014 (UTC)[reply]

{{look}}

Since most of the time liaison is predictable from the spelling, I don't see a compelling need to regularly indicate it in the pronunciation section. For exceptional cases, we can use usage notes. --WikiTiki89 10:14, 19 March 2014 (UTC)[reply]
Does this speak your mind? --kc_kennylau (talk) 10:45, 19 March 2014 (UTC)[reply]
Yes. --WikiTiki89 11:53, 19 March 2014 (UTC)[reply]
As Dakdada said, rules regarding liaisons are complex, so the third option is unacceptable. The second option could do, with /vu.z/ to render the fact that the last consonant is pronounced in the first syllable of the next word. But I would rather agree with Wikitiki89 and just have a note for exceptions. – Pylade (talk) 22:11, 26 March 2014 (UTC)[reply]
If we list only exceptions, then the majority of entries won't have this information in IPA form. It can be predicted from spelling, maybe, but that's only if you know how to pronounce French words to begin with. So it becomes a bit of a circular argument at that point. I think we should include liaison in our pronunciation sections in some form. —CodeCat 22:18, 26 March 2014 (UTC)[reply]

Request for Comments: c: link prefix for Wikimedia Commons[edit]

There is a cross-wiki discussion in progress as to whether c: should be enabled globally as an interwiki prefix for links to the Wikimedia Commons. As your wiki has several pages or redirects whose titles begin with "C:", they will need to be renamed if this proposal gains consensus. Please take a moment to participate in the discussion. Thank you.

What this means: Our entry at c:a will instead automatically take people to commons:a once software support for the prefix is implemented. We have a couple of suggested options, like moving to a fullwidth rather than a halfwidth colon, or moving to Appendix:Unsupported titles. TeleComNasSprVen (talk) 06:09, 17 March 2014 (UTC)[reply]
Moving to a subpage of Unsupported titles seems like the best idea. (I thought entries with colons in their titles were already unsupported titles.) - -sche (discuss) 22:56, 17 March 2014 (UTC)[reply]
On further thought maybe this should have been posted to Wiktionary:Requests for moves, mergers and splits instead. TeleComNasSprVen (talk) 16:40, 18 March 2014 (UTC)[reply]
I think it was good to post it here. It's an invitation to participate in a discussion that affects not just our wiki but other wikis, and which, if it results in a consensus to use "c:" as a shortcut to commons, will force us to move "c:a". - -sche (discuss) 19:02, 18 March 2014 (UTC)[reply]
Appendix:Unsupported titles#Unsupported prefix is what we've been doing with these thus far, including for Swedish terms (as c:a is). I see no reason to change, nor, indeed, reason we should object to the interwiki prefix c: on preexisting-enwikt-entry grounds.​—msh210 (talk) 03:14, 18 March 2014 (UTC)[reply]
The point of the shorter prefix is that it is easy to type.
The c: prefix could be used as a shortcut for our Citations: namespace. There isn’t much demand for it, but it would be nice to promote it by making it as easy to use as possible. But if it to be a global prefix, we may as well be compatible with everyone else. Michael Z. 2014-03-31 02:50 z
But if there's no demand for it, why would we? It seems like the trouble we'll have of fixing it is more than the benefit we'll ever have from the shortcut... —CodeCat 02:55, 31 March 2014 (UTC)[reply]
The Request for Comment ended; there was judged to be consensus that "c:" should be made a shortcut for "commons:". The devs can be expected to implement that consensus. It would be foolish to start using "c:" for something else locally, since such uses will break once the devs implement "c:" → "commons:". - -sche (discuss) 03:51, 31 March 2014 (UTC)[reply]

Redirects in Wikisaurus[edit]

I would like to start using redirects in Wikisaurus because, the way that it is now, I have to search through synonyms of the word I'm looking for to find the page with synonyms of the word I'm looking for. For example, if I want synonyms for ceiling, it is quite possible that a page for ceiling does not exist, and the synonyms that I'm looking for are on the roof page. However, I cannot do this at present (without messing up the system) because if (using the previous example) I make ceiling a redirect that sends people to roof, then on the roof page the word ceiling has "[WS]" next to it, which links to the Wikisaurus page for ceiling. (Wikisaurus does this automatically.) But then the link to the ceiling page is completely useless and misleading because it redirects to roof, which is the page that the link is on.

I don't know what would be the best course of action here, but I think something should be done about this if possible. Any ideas? Thanks. —TeragR disc./con. 03:15, 18 March 2014 (UTC)[reply]

You made redirects at Wikisaurus:soda and Wikisaurus:pop to Wikisaurus:soft drink. These were really bad: pop more often than not refers to the music genre, while soda may refer to something quite unsuitable for drinking. Wikisaurus pages should have unambiguous titles (or at least titles with few closely related senses). A good thing to do would be to put a link to Wikisaurus:soft drink at soda, for example. Keφr 18:45, 18 March 2014 (UTC)[reply]
Those do not need to be red links, though. We could very well have the equivalent of a disambiguation page at Wikisaurus:soda giving readers the option of going to soda, sodium bicarbonate, Wikisaurus:soft drink, or w:soda. bd2412 T 17:29, 20 March 2014 (UTC)[reply]
I think we have it already — at [[soda]]. Keφr 17:59, 20 March 2014 (UTC)[reply]
A reader could conceivably type Wikisaurus:soda, because they are looking for a synonym of some meaning of "soda". bd2412 T 21:17, 20 March 2014 (UTC)[reply]
A reader could also conceivably type Help:What are the synonyms of soda?. We don't have to account for every situation. --WikiTiki89 21:40, 20 March 2014 (UTC)[reply]
You're right, we can't account for everything, but "Wikisaurus:soda" is a completely reasonable thing for a user to type (no matter which sense they're using), and we should have some way of at least pointing them in the right direction. —TeragR disc./con. 20:25, 18 April 2014 (UTC)[reply]

{{look}}

  • I oppose adding Wikisaurus-namespace-redirects to Wikisaurus, such as Wikisaurus:soda to Wikisaurus:soft drink. Wikisaurus should not have a disambiguation page for every mainspace entry; the mainspace entries themselves should link to Wikisaurus entries from their Synonyms sections. For instance, cat#Synonyms links to WS:cat and WS:man. Other than that, each Wikisaurus page contains a search box that only searches in Wikisaurus namespace, so typing "soda" there should take you to relevant Wikisaurus pages. --Dan Polansky (talk) 07:31, 19 April 2014 (UTC)[reply]
There is a search box on Wikisaurus pages, but I find it almost useless because Wikisaurus rarely has a page for what I'm searching for, and even if it does, the page is rarely titled exactly what I searched, which means that it redirects to the Wiktionary page for what I searched (if it has one).
Unless the synonym sections on the Wiktionary pages are so big that we feel that we have to put them on completely separate pages, why does Wikisaurus exist? Why not just have the synonym sections? Or we could add a new part to every entry that appears on a separate page (similar to the citations page) that has a list of synonyms, antonyms, etc. If we aren't going to do either of these, our only other decent option (as far as I can see) is to make a disambiguation page for every word that has more than one meaning.—TeragR disc./con. 02:26, 20 April 2014 (UTC)[reply]
You hit the nail on the head there. It's there only for synonym sections that are too big. --WikiTiki89 02:32, 20 April 2014 (UTC)[reply]
If that's true, why do we have a search feature for Wikisaurus? The regular Wiktionary search is still there, and if there are any synonyms on the page which have pages of their own, there will be links to those pages next to the words. The search is quite useless as it is, and it is quite deceiving if we only have Wikisaurus pages for select words. —TeragR disc./con. 03:47, 20 April 2014 (UTC)[reply]

Suffix category name conflict[edit]

The Hungarian -ika suffix can form both diminutive nouns and can be the end of regular (non-diminutive) nouns. I'd like to separate these two groups but the standard category name for both is Category:Hungarian nouns suffixed with -ika. What would be a good naming convention to resolve this with two categories? --Panda10 (talk) 20:58, 19 March 2014 (UTC)[reply]

I’ve ran across the same problem in other languages. I propose that in such cases we add a short gloss to the category name, as Category:Hungarian words suffixed with -ika (diminutive), that they be subcategories of glossless category, and that the alt1= parameter be used with {{suffix}} to suppress the gloss. — Ungoliant (falai) 23:51, 19 March 2014 (UTC)[reply]
But how do we prevent entries from being added to the "main" category? —CodeCat 00:26, 20 March 2014 (UTC)[reply]
They will have to be fixed and the user who added it told about the new system. — Ungoliant (falai) 00:45, 20 March 2014 (UTC)[reply]
Better if there's a system built in to the template so people can't just make up their own names (any system that can be abused will be abused). DTLHS (talk) 00:52, 20 March 2014 (UTC)[reply]
Separating categories is very important. Right now the derived terms in -իկ and many other entries misleadingly list all the suffixed terms for both suffixes. I propose to separate the categories by adding a superscript number, so Category:Latin words suffixed with -ceps¹, Category:Latin words suffixed with -ceps², etc. The number can be added by {{suffix}} with a special parameter. Short glosses instead of numbers are not a good idea, as in some languages there are unproductive suffixes of unknown function which you can't describe by a gloss, e. g. -որ (-or). --Vahag (talk) 07:53, 20 March 2014 (UTC)[reply]
Superscript numbers are too vague, they don't really mean anything. —CodeCat 13:51, 20 March 2014 (UTC)[reply]
Some other categories/suffixes which have the same issue are mentioned in the Tea Room. They are: Latin -ceps, English -er (although we may not want to split it), and English -est. - -sche (discuss) 00:34, 20 March 2014 (UTC)[reply]
Cf. user:msh210/Sandbox#adjectives ending in 'ly'.​—msh210 (talk) 15:34, 21 March 2014 (UTC)[reply]
How about distinguishing "ending in" from "suffixed with"? Sometimes a common ending is not an independent suffix. We could add a type=ending or altcat=ending to the {suffix} template to indicate that a Hungarian nouns ending in -ika is needed instead of Hungarian nouns suffixed with -ika. I'm not saying this will solve every situation in question, but it would be helpful in many cases.--Panda10 (talk) 14:35, 20 March 2014 (UTC)[reply]
Categories for words "ending in" would be just statistical, rather than etymological. So we'd need a whole new category tree for those, along the lines of "terms spelled with (character)" categories. —CodeCat 15:37, 20 March 2014 (UTC)[reply]
Ok, then let's return to the idea of adding a short gloss to the category name:
  • There would be the standard category Hungarian nouns suffixed with -ika as today.
  • There would be an altcat1 (and altcat2 if needed) parameter in {suffix} to allow glosses.
  • The allowable glosses would be stored in a list somewhere that the template would check.
  • If the user provides altcat1=diminutive and if diminutive exists on the allowable gloss list, then the entry would be placed in the Hungarian nouns suffixed with -ika (diminutive) category but not in the regular category.
  • If the user provides a gloss that is not on the allowable gloss list, the word would go to the regular category.
  • The regular category might have to be monitored for incorrect entries, but I don't see this as an issue. --Panda10 (talk) 18:01, 20 March 2014 (UTC)[reply]
Perhaps we can set up something analogous to sensids for the categories to attach to. This ID would be appended to the category name "Category:Hungarian nouns suffixed with -ika ([whatever the id is]) , and would be supplied as an optional parameter to the suffix template. Contributors would know which one to use by checking the suffix entry.
This would be especially useful if it could be set up so the suffix template's module could check whether the suffix entry had at least one disambiguation id for the language in question and flag uses of the template that had one or more ids available but didn't use them.
I don't know the technicalities well enough to even guess how it could be done but maybe someone else here will. Chuck Entz (talk) 08:25, 21 March 2014 (UTC)[reply]

Template:only in with links to deleted ISO-639 appendices[edit]

Some ISO-639 language codes were listed first at rfv, then rfd (see talk:jv). It was finally decided to convert them to "only in" links to the appendices for ISO-639 codes. Now that the appendices have been deleted, we have a bunch of entries with basically nothing but explanations of why there's no entry and a referral to a non-existent appendix (see Special:WhatLinksHere/Appendix:ISO_639-1 language codes and Special:WhatLinksHere/Appendix:ISO_639-3 language codes).

Should we:

  1. delete the entries?
  2. change them to link to the WP disambiguation page for the entry name?
  3. change them to link to the appropriate WP list of ISO 639 codes?
  4. change them to link to WT:LL?
  5. do something else?

If we decide to delete the entries, there's a related issue re: {{also}} in some of these entries and whether to remove the entry names from the same template in other entries. Chuck Entz (talk) 20:59, 21 March 2014 (UTC)[reply]

If we keep them, option 3 seems best. —CodeCat 21:21, 21 March 2014 (UTC)[reply]
We should restore the two appendices. They were never properly deleted. It was just CodeCat's decision which was not voted and was far from our consensus. --kc_kennylau (talk) 22:30, 21 March 2014 (UTC)[reply]
She proposed deletion, in the appropriate forum, and waited almost eight months for feedback. During that time, exactly one other contributor commented on the proposal, and that contributor expressed agreement. In what sense is that "not voted" or "far from our consensus"? What more do you want her to have done? —RuakhTALK 23:03, 21 March 2014 (UTC)[reply]
Why would one delete something relatively harmless with so many inbound links without having made arrangements for replacement of the links? Why ever delete based solely on one's own opinion? There probably continue to be a steady stream of such "tidyings" to this day at {{WT:RFDO]]. DCDuring TALK 23:13, 21 March 2014 (UTC)[reply]
WT:RFDO exists because it's not good to delete things just based on one's own opinion. And that's why I submitted it there. If nobody actually submits their opinions after I ask them for it (except for that one person), then how is that my fault? —CodeCat 01:34, 22 March 2014 (UTC)[reply]
Whatever omissions or misnomers our appendix may have had, I thought our protocol for deletions of linked-to pages would be to first correct the links, the equivalent of depopulating a category. If that isn't our practice it should be, as our current practice makes the perfect the enemy of the better-than-nothing (The now-deleted appendix was superior to redlinks in, I'd guess, more than 98% of cases.) DCDuring TALK 20:24, 22 March 2014 (UTC)[reply]
As noted in the deletion discussion, our appendices only duplicated content that Wikipedia (and the ISO) already had, and our appendices were woefully out-of-date compared to Wikipedia's. Deleting them was the right thing to do. Changing pages which link to the appendices so that they link to Wikipedia's appendices instead is the best solution, but as a stopgap until all the relevant pages can be updated, we could make our appendices redirect softly to Wikipedia's. - -sche (discuss) 02:23, 22 March 2014 (UTC)[reply]
Special:WhatLinksHere/Appendix:ISO_639-1 language codes, Special:WhatLinksHere/Appendix:ISO_639-2 language codes, Special:WhatLinksHere/Appendix:ISO_639-3 language codes and Special:WhatLinksHere/Appendix:ISO_639-5 language codes have now been orphaned, except from a few old talk pages and sandboxes. - -sche (discuss) 06:47, 28 March 2014 (UTC)[reply]

Throughout their entire history of editing, this editor has demonstrated what can be described as either notoriously bad judgement or POV pushing. Highlights:

The "good" edits are negligible, and almost always intertwined with some form of the above. Interacting with the user seems to be a waste of time: they are neither improving nor taking hints to leave. I propose an indefinite block. Keφr 08:52, 22 March 2014 (UTC)[reply]

I think we should give him a sort of official warning, clearly stating that if he does not start using better judgment with his editing then he will be blocked. And the block should not be immediately indefinite, we should start off by blocking him for a month, then a year, and only then a permanent block. --WikiTiki89 08:59, 22 March 2014 (UTC)[reply]
I doubt it will help. If he wants to push a POV, he will just shift to new, more stealthy ways of doing it; if he is incompetent, he is probably also incapable of recognising (a lack of) competence in himself and others; telling him to "start using better judgment" will fall on deaf ears as he will not understand what we want from him. Month-long blocks have already been handed, they did not help much either. Keφr 09:21, 22 March 2014 (UTC)[reply]
If it falls on deaf ears, then we will block him; that's the whole concept of a warning. If month-long blocks have already been tried, then I guess we can skip right to a year-long block. --WikiTiki89 09:37, 22 March 2014 (UTC)[reply]
I have taken heed of the warning. Pass a Method (talk) 09:51, 22 March 2014 (UTC)[reply]
I would say they're more willing to change than you give them credit for. The main problem has been that they tend to honor the letter rather than the spirit of the objections, and tend to slip back into their old ways after a while. They also tend to shift into other types of entries and make analogous, but different types of errors.
This reminds me a lot of Gtroy/Acdcrocks/Luciferwildcat. In both cases they identified real problems in lack of coverage for controversial and/or unpopular subjects, but both exercised poor lexicographic judgment and a sort of counter-prescriptive wishful thinking: terms should exist or have the meanings they thought they should because it made sense in their way of thinking, and the only reason they weren't attested was censorship or bias in the coverage of available sources.
The sad part is that a clear-headed search for religious and political biases in our entries was and is a good idea, and we needed and still need better coverage of terms that go against the biases of our contributor base. I just wish they had approached it with more common sense so that they wouldn't have wasted so much of their truly prodigious effort on bad edits.
For all the time wasted in rfv and rfd of stuff that should never have been added, and all the checking needed to weed out the pov and the phoniness they've added to so many entries, they have made some real contributions in increasing our coverage of religious terms outside of mainstream Judaism and Christianity. Chuck Entz (talk) 20:39, 22 March 2014 (UTC)[reply]

Accents in Greek transliteration[edit]

I've always wondered why accents are not included when transliterating Greek. We include them for Cyrillic, so why not for Greek? I think we should have them for Greek too. —CodeCat 18:48, 22 March 2014 (UTC)[reply]

If you mean Modern Greek, then AFAIK accents are included. Or do you mean Ancient Greek? — I.S.M.E.T.A. 19:10, 22 March 2014 (UTC)[reply]
Ancient Greek: cf. άνθρωπος (ánthropos) vs. ἄνθρωπος (ánthrōpos). I thought we discussed it a while back and decided against it because it would be so complicated in some cases, e.g. having to transliterate ῇ as ẹ̄̂. —Aɴɢʀ (talk) 19:27, 22 March 2014 (UTC)[reply]
My main concern in the past was the difficulty of producing combinations of macrons and accents. Since we have a module to do the heavy lifting now, I think it's worth reconsidering. Chuck Entz (talk) 19:32, 22 March 2014 (UTC)[reply]
As I'm sure everyone is tired of hearing, I support a simple, intelligible transliteration for Ancient Greek (and all other languages that have native script support), preferring to demonstrate nuanced detail in the actual script and in the IPA of the pronunciation sections. -Atelaes λάλει ἐμοί 19:46, 22 March 2014 (UTC)[reply]
I think the suggestion user:Gilgamesh gave at Wiktionary talk:About Ancient Greek#Accents in transliteration is pretty workable. He gives these examples: η=ē ή=ḗ ὴ=ḕ ῆ=ê ῃ=ēi ῄ=ēí ῂ=ēì ῇ=ēî, ηι/ηϊ=ēï. The circumflex accent is indicated with ^ but this implies vowel length, so the macron is omitted. Iota subscripts are simply treated as diphthongs, which is what they were originally. —CodeCat 21:01, 27 March 2014 (UTC)[reply]
We should have Ancient Greek accents. --Anatoli (обсудить/вклад) 22:33, 27 March 2014 (UTC)[reply]

Anusvara[edit]

It would be great if someone helped with the anusvara (nasalisation symbol) problem in all Indic languages. It affects the pronunciation/transliteration of the vowel and m/n consonants, depending on what follows, for the lack of a better way, it's always "ṁ" but should ã (or other vowel with nasalisation), vowel + "ṅ", vowel + "m" or vowel + "n". Such modules are Module:te-translit, Module:si-translit, Module:hi-translit, etc. The native symbols in any of these modules don't matter, it's the quality of the consonant or lack of it that matters. @CodeCat, could you help? The nasalisation and its transliteration is described here - Wiktionary:Hindi_transliteration#Nasalisation but I can describe more and make test cases if you agree. Just need to fix one module, the rule is applicable for all. --Anatoli (обсудить/вклад) 22:33, 27 March 2014 (UTC)[reply]

As w:Gerund indicates, the term is applied to a wide variety of verb-like words that don't necessarily have much in common otherwise. In English, Dutch and German, it's synonymous with "verbal noun", while in certain Romance languages it indicates an adverb. Because the term "gerund" is so ambiguous, I don't know if it's a good idea to have this category. It would be better to put the entries into other categories, with a name that is more descriptive for the words in question. —CodeCat 21:10, 22 March 2014 (UTC)[reply]

You might be right. I was excited to add the boiler (that I didn't know existed) to Category:Livonian gerunds but now I have noun category at the bottom. Livonian gerunds are inessives of infinitive and don't decline for anything else (being able to take the inessive is nominal-like behavior but they should be able to take on other cases too if they were nominals but if they would they wouldn't be gerunds anymore.) Tangentially Latvian -ot are also strictly verbal/adverbial (as in not nominal). Neitrāls vārds (talk) 14:02, 25 March 2014 (UTC)[reply]

Various votes have started or ended[edit]

- -sche (discuss) 20:33, 23 March 2014 (UTC)[reply]

Easter Competition 2014[edit]

This is to announce the forthcoming Easter/Spring competition, which will be open to all users, and will take the form of a crossword.

SemperBlotto (talk) 19:26, 24 March 2014 (UTC)[reply]

As always, I will offer all technical support. --kc_kennylau (talk) 23:15, 24 March 2014 (UTC)[reply]

Finnish accusative[edit]

There has been some disagreement between me and User:Hekaheka over the Finnish accusative case. In the past, our templates always showed at least two forms in the accusative singular box. One that had a form identical to the genitive singular, and one that had a form identical to the nominative singular. When I converted the templates to Lua, I removed the second one, because in my view this isn't an accusative case form at all, and the two aren't just interchangeable, they have separate uses. I wonder why the form that looks like the nominative is called an accusative at all. It's used primarily with imperatives, which have no explicit subject. So it's not a real accusative form in my opinion; it's just the nominative used in the role of imperative direct object. It's not uncommon for languages to use different cases in limited syntactic roles, than they would otherwise. In Slavic languages for example, negative verbs take an object in the genitive, but that doesn't mean that the genitive form should be included under "accusative". Likewise, languages like Icelandic or Latin may have a dative object, but that doesn't make the dative form accusative.

Hekaheka has argued that grammars and standard bodies have stated that there is no accusative case at all, and only a genitive. To me, this position is completely untenable, and this becomes clear when you look at the plural forms of nominative, accusative and genitive together:

  • nominative: talo, talot
  • accusative: talon, talot
  • genitive: talon, talojen

This clearly shows that there is no single case that aligns perfectly with accusative function, other than the accusative itself. If you eliminate the accusative as a case, then you end up with convoluted rules like "objects use the genitive in singular but nominative in plural", which doesn't help anyone.

It should also be noted that the identity between accusative and genitive forms is a historical accident, and is a result of sound change rather than a real functional identity between the two cases. In Proto-Uralic, the accusative had the ending -m, while for the genitive it was -n. In the prehistory of Proto-Finnic, final -m became -n, causing the two cases to become identical in form. So historically, there definitely was an accusative case, and the modern Finnish accusative singular -n is a direct descendant of that; it has nothing to do with the genitive. (The accusative plural is identical to the nominative; I don't know whether that's also a historical accident, or something else, but that's not relevant here.)

Since Hekaheka and I just keep arguing on this, I'm asking for wider feedback. —CodeCat 20:09, 24 March 2014 (UTC)[reply]

Hekaheka has said that there are two mainstream theories: one, which you summarize and criticize above, is that there is no accusative in modern Finnish, except perhaps in certain pronouns. The other is that there are two accusative forms in modern Finnish. Historically, Wiktionary accepted the latter view. If Hekaheka is right that these are the only two mainstream theories, I think we should continue to follow one of them — presumably the one we've been following — because I don't think the arguments of one person who doesn't speak Finnish are more convincing than the views of Finnish linguists. - -sche (discuss) 20:43, 24 March 2014 (UTC)[reply]
But should we blindly follow those arguments, or should we evaluate it for ourselves? It seems that you're doing the former; you're not evaluating my arguments based on my background, which is bad linguistics for sure. —CodeCat 21:09, 24 March 2014 (UTC)[reply]
  • Another 2p from someone who doesn't know squat about Finnish, so take this for what you will.
I've been taught that, if a language has grammatical cases, and a certain set of case endings is used for a distinct grammatical role, then the grammatical case for that grammatical role exists, even if the case endings it uses are shared with other case endings.
Case in point (pun intended): German. Broadly speaking, the feminine singular nominative matches the feminine singular accusative matches the plural nominative matches the plural accusative. But we don't say that the plural nominative or accusative cases don't exist. Similarly, the feminine singular dative matches the masculine singular nominative. But we don't say that the masculine singular nominative doesn't exist. Etc., etc.
If the Finnish language has endings that function as a grammatical accusative case, i.e. a set of endings that are consistently applied when a noun is used as the grammatical object, then it follows that Finnish has a grammatical accusative case, even if those endings happen to be shared with other grammatical cases.
I think part of the confusion might stem from how the Finnish authors define “case”. I don't read Finnish, but following the thread here and on CodeCat's Talk page, it sounds to my ears as though the Finnish authors in the "accusative doesn't exist" camp have taken the view that “case” == “distinct endings”. Meanwhile, those authors in the "accusative does exist" camp have taken the view that “case” == “grammatical role”. My bias is towards the latter.
Either way, any in-depth treatment of Finnish grammar (perhaps WT:About Finnish) should at least mention that there are both interpretations, describe them briefly, and explain which interpretation Wiktionary entries adhere to. ‑‑ Eiríkr Útlendi │ Tala við mig 21:59, 24 March 2014 (UTC)[reply]
(E/C) I see this situation as being very similar to Russian. But even if you (like me) consider that Russian masculine nouns do not have an accusative case, which is instead supplemented by the nominative or genitive depending on animacy, it is not wrong to call the supplemented form an accusative. Also, history is completely irrelevant, otherwise Modern English has cases as well, only all the endings except for the genitive have merged. --WikiTiki89 22:07, 24 March 2014 (UTC)[reply]
  • Um, I was always taught that English does have cases. Where and how to use whom, etc. only makes sense from a case perspective. ‑‑ Eiríkr Útlendi │ Tala við mig 22:20, 24 March 2014 (UTC)[reply]
    English has a relic of cases in the personal pronouns, but English nouns certainly cannot be considered to have cases. --WikiTiki89 22:25, 24 March 2014 (UTC)[reply]
    In Finnish, like IE, adjectives agree with nouns in case, and that agreement is what is generally considered to be evidence for case-ness in Finnish. Some grammars cite a "prolative" case for Finnish, or several others, but they don't show adjective agreement, so they are not considered true cases. As for English, it has a two-way case distinction in the pronouns, but not in the nouns (I don't consider the possessives a case; they were adjectives historically). So English has cases, but as a closed set and they are not productive. —CodeCat 22:47, 24 March 2014 (UTC)[reply]
    I thought that the possessives are descended from the Old English genitive case ending -es. --WikiTiki89 22:51, 24 March 2014 (UTC)[reply]
    Of nouns, yes. But the pronoun possessives were adjectives (or determiners) and agreed in case and number with the noun they modified, like they still do in German and in the Romance and Slavic languages. At least the first- and second-person ones did, but in German the third-person possessives now also inflect, so maybe that happened sometime in English history too before cases disappeared altogether. —CodeCat 22:55, 24 March 2014 (UTC)[reply]
    Well maybe I wasn't clear, but I meant to refer specifically to nouns. --WikiTiki89 23:41, 24 March 2014 (UTC)[reply]

The point of view that I'm promoting is this (copied from en-Wikipedia article on accusative):

According to traditional Finnish grammars, the accusative is the case of a total object, while the case of a partial object is the partitive. The accusative is identical either to the nominative or the genitive, except for personal pronouns and the personal interrogative pronoun kuka/ken, which have a special accusative form ending in -t. For example, the accusative form of hän (he/she) is hänet, and the accusative form of kuka (or ken) is kenet.

As there are two singular accusative forms, our declension table should show both. If I have not misunderstood, CodeCat argues that only the genitive-looking form is a "true" accusative and therefore only it should be shown. --Hekaheka (talk) 22:22, 24 March 2014 (UTC)[reply]

Hekaheka also showed me this, which I found to be much more convincing evidence:
I know I'm walking on thin ice, but the thinking may depend on the fact that in case of personal pronouns (the only true accusative forms that everyone seems to agree on) the accusative is used as equivalent to both nominative and genitive accusatives. At a slave market one might say ostan orjan/ osta orja but if one uses hän instead of orja, it becomes ostan hänet/ osta hänet. The grammatical case must be the same both if the object is a noun or if it is a pronoun - ergo, there's a nominative accusative form. In the end, the existence of nominative-accusative is at least partially a question of convention, but this is anyway the convention that is widely agreed upon.
(my reply, copied from my talk page):
Ok, that is an argument that does make some sense to me at least. The fact that the nominative of a noun becomes the accusative of a pronoun shows that there is a functional connection between the two.
But then, if we include them both under "accusative" then people may think that they're equivalent and interchangeable. We do the same with alternative genitive and partitive plural forms, after all. So I propose changing the table a bit, so that the "accusative" line becomes two rows high, and have two sub-rows each showing the two possible types of accusative. What should we call those sub-rows? Is the imperative the only case where the second accusative (the one like the nominative) is used, or are there others?
CodeCat 22:42, 24 March 2014 (UTC)[reply]
  • I strongly feel that we should somehow differentiate the two forms. I do not know in what contexts each of them are used, but we can come up with names for those contexts and call them "X accusative" and "Y accusative". --WikiTiki89 22:46, 24 March 2014 (UTC)[reply]
As a native speaker of Finnish, I agree with everything Hekaheka has said on the subject. Thousands of Finnish declension tables are erratic now. Either put back the nominative in the accusative singular box, or remove all accusative boxes. User Jyril who created the old declension tables is quite competent too. CodeCat is welcome to invent new rules for Finnish grammar in the discussion rooms, but we cannot base the declension tables on them. --Makaokalani (talk) 11:59, 27 March 2014 (UTC) It would take several pages to explain the rules when to use nominative or genitive type. Native speakers go by ear, and foreigners never seem to learn them completely. You can call them "nominative type" and "genitive type" if that makes you happy.--Makaokalani (talk) 12:14, 27 March 2014 (UTC)[reply]
There don't seem to be any agreed-upon names for them. One paper I found on the subject labelled them the "n-accusative", "t-accusative" and "zero accusative". Since no words distinguish the t-accusative from any other type, we can ignore that and just label it "accusative" for those words. So should our tables name them "n-accusative" and "zero accusative"? —CodeCat 14:32, 27 March 2014 (UTC)[reply]
I think a description of their function is better than a description of how they're formed. --WikiTiki89 16:35, 27 March 2014 (UTC)[reply]
But what would concisely describe their function? The n-accusative is the "default", but the zero accusative isn't used in situations that form a clear pattern. It's used:
  • as the object of imperatives
  • as the object of passives
  • in some infinitive constructions
and probably in some other situations as well; a native Finnish speaker could probably elaborate. —CodeCat 17:48, 27 March 2014 (UTC)[reply]
You cannot explain their function in one word. Any person who knows basic Finnish will know that object forms are tricky, and won't look for advice in a declension table. Just call them something. Hekaheka suggests nominative-accusative and genitive-accusative. Zero-accusative and n-accusative are fine with me too. Anything to stop this argument and finally fix the declension tables. --Makaokalani (talk) 16:33, 28 March 2014 (UTC)[reply]

I agree with Makaokalani, let's move into action. CodeCat, forget what I wrote about use of nominative-accusative on your discussion page and check Accusative case#Finnish instead. I erred with some infinitive and participle forms, because normally I don't need to think about them. Sorry for that. Anyway, we must change the current way of displaying Finnish accusatives, as it does not conform with any mainstream linguistic theories. I would be totally happy with the system we had, showing genitive-accusative and nominative-accusative forms in singular and nominative-accusative in plural without any further explanations. Those who don't understand why it's that way, may study Finnish grammar from available sources in the net and elsewhere or post a question on the Feedback page. After all, this is not the only unexplained thing in the table. I bet most users unfamiliar with Finnish will have a problem with understanding the concept behind "instructive", just to name one. If further labeling is deemed necessary, I have given one proposal in the discussion we had on CodeCat's discussion page. Another option is to link the word "Accusative" in the declension table to the page akkusatiivi, which contains some discussion on accusative in Finnish and a link to Wikipedia. --Hekaheka (talk) 20:39, 14 April 2014 (UTC)[reply]

I've modified the table now. —CodeCat 21:48, 14 April 2014 (UTC)[reply]
Good, thank you. But I still have a complaint about the table headers. The terms "zero?" and "normal" are not an even pair (or even a pair, if you like). The term zero-accusative refers to zero ending and its counterpart is n-accusative in which "n" is not short for "normal" but a reference to the "-n" -ending. Second, this discussion is the first time I ever hear the terms zero-accusative and n-accusative. Nominatiiviakkusatiivi (nominative-accusative) and genetiiviakkusatiivi (genitive-accusative) on the other hand seem to be used by many writers and they are also used in both English and Finnish Wikipedia. In order not to confuse the users of various Wiki sources it would probably be recommendable to use consistent terminology across the Wiki projects, even if it's not perfect. Therefore, change "zero?" to "nom." or "nom.acc." and "normal" to "gen." or "gen.acc.", please. --Hekaheka (talk) 23:13, 14 April 2014 (UTC)[reply]
I'd prefer to avoid those terms as their names are not the clearest. "Genitive accusative" sounds like it's an accusative that is used together with a genitive or something like that. The terms "0-accusative" and "n-accusative" are used in the paper I mentioned, The Finnish accusative: Long-distance case assignment under agreement by Anne Vainikka and Pauli Brattico (view). I just switched it with "normal" because it is the normal default accusative case form, and because having just "zero" and "n" looks strange and doesn't say much. With the name "normal" it's clearer to readers that in most circumstances that's the accusative you should use. —CodeCat 23:25, 14 April 2014 (UTC)[reply]
Would it be clear? The nominative accusative could also be called "normal" because accusative plural is always identical with the nominative. And partitive is also used for objects of verbs, more often than the genitive accusative. The combination "zero" (for form) and "normal" (for imagined function) sounds illogical and a bit amateurish to me. Why not use the old terminology? It would make the Wiktionary sound more professional.--Makaokalani (talk) 10:09, 17 April 2014 (UTC)[reply]
I don't think "genitive accusative" sounds professional at all. Why else would the paper I linked to use different terms? —CodeCat 17:24, 17 April 2014 (UTC)[reply]

French open vowels[edit]

French has two open vowel phonemes, /a/ and /ɑ/, but many speakers make no or little difference between them, even though there exists minimal pairs. Therefore, either the underlying phoneme is /a/ or /ɑ/, pronunciation is often written down with only /a/ representing these two phonemes; and when one encounters /a/, one can never be sure that is a /a/ indeed.

To solve this issue, I think we should consider using /a/ where the underlying phoneme is not known or usage varies, and /æ/ where the underlying phoneme is /a/ indeed. That policy would not break any previously written entry as /a/ is confusing anyway.

What do you think of this proposition? – Pylade (talk) 20:59, 26 March 2014 (UTC)[reply]

I'm sorry, but I think that's a really bad idea. Even if we could do that consistently, it would "solve this issue" only for those readers who knew that that's what we were doing. For everyone else, it would just be more confusing or misleading (or else look simply erroneous). —RuakhTALK 06:41, 31 March 2014 (UTC)[reply]
That’s why this has to be a policy on which we collegially agree. Surely then we should link updated entries to a page describing the policy for French. But I can’t see how things could go more confusing than now; whereas /æ/, not being traditionally used for French, would only require a look at the policy to make sense. – Pylade (talk) 20:42, 31 March 2014 (UTC)[reply]
If even I — well acquainted with Wiktionary — and surely someone unacquainted with it — came across a pronunciation with æ, I'd assume it pronounced æ, not that it's pronounced some other way and I need to check a help page to see how. (I'm not talking about those fluent in French. They'd probably notice something amiss. But most enwikt users aren't fluent in French.)​—msh210 (talk) 23:05, 31 March 2014 (UTC)[reply]
The funny thing is this phoneme is actually pronounced closer to [æ] than [a]. – Pylade (talk) 14:16, 1 April 2014 (UTC)[reply]
Of course, that's true of both of them . . . —RuakhTALK 02:38, 2 April 2014 (UTC)[reply]

CheckUsers on Wiktionary[edit]

Hello. Wiktionary has been getting quite a few spambots recently, most of which are blocked by some lovely abusefilters. However, it would be very useful to be able to get CheckUser data for those accounts, which could be used to prevent spam elsewhere. All of the current CheckUsers here are inactive, so would it be possible to elect some new ones, or modify the local checkuser policy to allow stewards to check locally in cross-wiki cases, such as anti-spam? Regards, Ajraddatz (talk) 04:36, 27 March 2014 (UTC)[reply]

Yeah ... we really do need to de-checkuser the inactive checkusers, and elect new checkusers. And possibly also allow stewards to check locally in spam cases, although if we do the first two things, that will be less important. - -sche (discuss) 05:38, 27 March 2014 (UTC)[reply]
What exactly is a CheckUser (other than a Dan Polansky)? --WikiTiki89 05:46, 27 March 2014 (UTC)[reply]
You can see m:CheckUser policy for the global policy, or Wiktionary:CheckUsers for the local page. --Rschen7754 06:01, 27 March 2014 (UTC)[reply]
To expand on my earlier comment (aware that this is somewhat tangential)... allowing inactive users to retain admin rights, as en.Wikt often does, is one thing; it can even be useful, in that it allows the users to immediately resume blocking vandals and deleting vandalism if they do return for short spurts of activity. Allowing inactive users to retain checkuser rights is different. Checkusers are in a powerful position to invade other users' privacy.
I stumbled across an old thread some time ago in which someone had proposed requiring that people undergo checkuser checking before being granted things like adminship. An admin had replied that he would be wary of that, since he actually had to change accounts because someone had stalked him under his previous wiki account and in real life.
Accordingly, checkusers should be people who are very active, because they should be people who are trusted by the current community (not just the community of users who elected them and then themselves became inactive). They shouldn't be just minimally active (en.WP requires one edit per year to retain admin rights, I think), they should be so active that the current community knows them; otherwise, the community isn't in a position to evaluate their trustworthiness. Also, checkusers should be people who are very active so that they can respond to situations as they arise. It's not as useful for an inactive checkuser to return seven months from now and tell us who was running the spambots.
Also, the Meta Checkuser policy states: "On any wiki, there must be at least two users with CheckUser status, or none at all. This is so that they can mutually control and confirm their actions. In the case where only one CheckUser is left on a wiki (when the only other one retires, or is removed), the community must appoint a new CheckUser immediately (so that the number of CheckUsers is at least two)." [...] "Any user account with CheckUser status that is inactive for more than a year will have their CheckUser access removed."
Looking at Special:ListUsers/checkuser, we are violating the spirit of that, if not the letter:
  1. Versageek has made 8 edits since 2009, half of them in 2010. I don't know them, and I'd hazard a guess that a sizeable number of our current editors (who joined after 2009) don't know them, either. They may be a great person, but I don't know that.
  2. Rodasmith made 5 edits last year, and 2 in 2012.
  3. Connel MacKenzie made 4 edits last year, 4 edits in 2012 (all to react to a motion that they be desysopped for inactivity by accusing one of the makers of the motion of bad faith), and just one edit in 2011 (which is why there was a motion to desysop them for inactivity).
  4. TheDaveRoss was reasonably active last year.
I think we should de-checkuser the first three, and then (per the Meta policy) either de-checkuser Dave or else appoint a new checkuser. Who wants to draft the de-checkuser votes?
Incidentally, I thought the folks at meta had a vote and were going to automatically desysop any of our admins (and presumably also checkusers, etc) who were inactive for more than 2 years, on the basis that we were failing to do so ourselves. Was I mistaken? - -sche (discuss) 07:02, 27 March 2014 (UTC)[reply]
The policy is at m:Admin activity review; while we stewards will post the notices, if there is local consensus to leave the admin rights, we will not desysop them. --Rschen7754 07:14, 27 March 2014 (UTC)[reply]
Also, as stewards we are willing to look at the CheckUser log and give general statistics on how often the CheckUsers are using the tool, since you cannot view it yourselves, if you would find this information helpful. --Rschen7754 17:19, 27 March 2014 (UTC)[reply]
I suppose that would be helpful, yes. :) - -sche (discuss) 03:20, 31 March 2014 (UTC)[reply]

<-I pop by here periodically to see if anyone is looking for me, but most of my activity these days is on en.wiki. I used to do xwiki spambot checking here, but since I haven't been on IRC much I don't get the requests anymore. I have no objection to giving up my en.wikt checkuser bit if the community would prefer to elect checkusers who are more active on the project. --Versageek 04:26, 28 March 2014 (UTC)[reply]

  • Voting on changing our local checkusers will take a while. Is there any objection to letting stewards use the checkuser tool on this wiki for anti-spam work in the meantime, i.e. until such time as we have more active checkusers? Or would we have to have a vote in order to allow that, which wouldn't actually take any less time than electing new checkusers? lol. (Alternatively, @Ajraddatz/Rschen, did you take advantage of Versageek's appearance to have him do the checkusering you needed done?) - -sche (discuss) 03:20, 31 March 2014 (UTC)[reply]
I only notified Versageek about the discussion on enwiki, since they seemed to be active there. --Rschen7754 19:39, 31 March 2014 (UTC)[reply]

CheckUsers on Wiktionary — Statistics[edit]

  1. Rodasmith last used the tool on 8 October 2010.
  2. Connel MacKenzie last used the tool on 20 May 2008.
  3. Versageek last used the tool on 20 December 2013.
  4. TheDaveRoss last used the tool on 21 October 2013.
  5. There were more checks in 2008-2010, by a large factor.
  6. Stewards have run emergency checks at times in 2012 and early 2013, but well before I became a steward, so I am unaware of the circumstances.
  7. There have been no checks run in 2014.
  8. Below are the month-by-month statistics in 2012 and 2013. Generally, fulfilling a request involves multiple log entries, which are what are counted below.
Month Versageek TheDaveRoss Stewards
January 2012 11 3
February 2012 3
March 2012
April 2012 2
May 2012 2
June 2012
July 2012 5
August 2012
September 2012
October 2012 1
November 2012
December 2012
January 2013 3
February 2013
March 2013 1 3
April 2013 2
May 2013 11
June 2013
July 2013 19
August 2013
September 2013
October 2013 2
November 2013
December 2013 2

Rschen7754 20:23, 31 March 2014 (UTC)[reply]

This is very informative, thank you! It seems two of the checkusers have been semi-active, while two have been totally inactive with the tool for years. I have drafted a vote to remove the two inactive users' checkuser bits: WT:Votes/cu-2014-04/De-checkusering inactive checkusers. Votes on electing new, fully active checkusers can come later. In the meantime, @Ajraddatz / @Rschen, if you still need checkuser data on the spambots that prompted this thread, ping Versageek again. - -sche (discuss) 08:14, 2 April 2014 (UTC)[reply]
Will do, thanks. Ajraddatz (talk) 01:41, 3 April 2014 (UTC)[reply]

User:Wikitiki89, User:-sche, User:Ruakh: is this thing going to move eventually or do we wait until December (when Colonel Connel's checkuser flag will lapse)? Keφr 16:31, 1 August 2014 (UTC)[reply]

Well, I initially postponed the vote so that we could discuss what level of community support would be required for a user to retain rights vs lose them. Some users were of the opinion that rights (like blocks) are granted to people that the community thinks should have them, and if there is no longer a (vote-passage-sized, e.g. ~2/3) majority of the community which thinks the users should have the rights/be blocked, the rights/blocks should be rescinded. (This philosophy prevailed in a vote on unblocking a certain user, but debates over user rights have been different.) Other users were of the opinion that once a user was given rights, they should retain them unless a sufficiently large (vote-passage-sized) majority favoured removing them. Like many discussions here, that discussion petered out without a clear conclusion. I would suggest restarting it (probably in a new thread) before restarting the vote. - -sche (discuss) 18:31, 1 August 2014 (UTC)[reply]
FWIW, although I support having such a discussion, I do not plan to drive it. So, Kephir, don't wait on my account. —RuakhTALK 04:57, 2 August 2014 (UTC)[reply]
I am super late to this party, but I honestly think that all CU work could be done by Stewards alone at this point. The majority of all checks done on en.wikt has always been related to a single user, and that person hasn't been as big a problem lately. There was only one other case where the tool was put to effective use in my recollection. Just my pair of pennies. - [The]DaveRoss 20:16, 16 October 2014 (UTC)[reply]
User:Wikitiki89, User:-sche, User:Ruakh, User:TheDaveRoss: FWIW, Rodasmith was completely inactive for 1 year so I removed their CU rights per the global policy a while back. Connel has edited in September, so the same would not apply for him until September 2015. Also note that if 2 CUs resign or are otherwise removed somehow, CU would go back to the stewards, since there must be >2 CUs or none at all. Or you could elect more CUs, but it doesn't seem like there's a huge need at this point considering that there isn't a huge backlog of requests. --Rschen7754 18:45, 16 November 2014 (UTC)[reply]

Fix Greek nouns' inflection line[edit]

As a result of the last change on Template:el-noun, the second parameter is now corresponding to plural, which used to be the third one. We need to remove the empty old second parameter and make the plural visible again. If there is no objection, I'm going to run my bot with a fix like that:
(ur'\{\{el-noun\|(.*)\|\|(.*)\}\}', u'{{el-noun|\\1|\\2}}')
I've already made some tests. Is it OK? Does anyone see any potential problem? --flyax (talk) 13:52, 29 March 2014 (UTC)[reply]

It's ok, I was already going to do this. —CodeCat 14:00, 29 March 2014 (UTC)[reply]

A Different Way To Look At Chinese?[edit]

I've been doing some thinking about Chinese, trying to sort out the implications of the different ways we've treated the Chinese lects. I can't say that I have a clear solution, but I have the inklings of what seems to me like a more rational conceptual framework. It may be incompatible with our current infrastructure, but here it is:

Let's assume that Chinese is a strictly written language, originally based on an earlier spoken lect, but that has developed since on its own independent of the spoken language(s). In this respect, it reminds me a lot of sign languages such as American Sign Language.

The "dialects" are independent, but strictly spoken languages which are translated into dialects of written Chinese when they are written. That makes written Chinese a sort of lingua franca that's used to communicate between speakers, though the example of the w:Code Talkers comes to mind, as well.

Following from that, Chinese would be considered a language, but so would Cantonese, Hakka, Mandarin, Min Nan, Wu, Xiang, etc (and/or some of their subdivisions and/or fellow members of their dialect groups, as well).

Some aspects would be complementary in distribution: pronunciation would be strictly for the spoken lects and orthography for the written one. Morphology and syntax, on the other hand, are partly tied in with the writing, but have dimensions that the writing simply doesn't address. Etymology of the spoken lects is quite different from that of the written, but there again, intertwined with it. Lexicon is also quite distinct, but there are regional terms in the writing and I'm sure the writing contributes to the spoken lects as well.

I'm not sure if this is developed enough to provide anything useful, but I thought I would present it and see what others think.

Comments? Chuck Entz (talk) 20:21, 29 March 2014 (UTC)[reply]

NB: A 'classical chinese' (WMF lang code zh-classical) written dialect dates from at least the 9th century ce, while an updated 'vernacular chinese' (WMF lang code zh) written dialect dates from at least the 20th century ce. The former is understood across a broader base of languages, the latter is understood by a larger population group. - Amgine/ t·e 17:24, 30 March 2014 (UTC)[reply]
Most of the modern written Chinese is basically the same across all Chinese varieties, e.g. standard written Cantonese only differs from standard Chinese (Mandarin) in style and word choices, pronunciation (when read out loud) and use of traditional Chinese (e.g. in Hong Kong) (the latter is not really a difference, since Mandarin is also written in traditional characters in Taiwan and Cantonese is written in simplified characters in Guangdong). Vernacular forms of dialects when written down differ more from standard Chinese as there are very common words, which are not used in Mandarin or have a different meaning. The specific words are very low in number, even if they are very common, every day words. Some spoken dialects simply lack the writing form or is not common and attestable.
Vote here for the Unified Chinese approach Wiktionary:Votes/pl-2014-04/Unified Chinese, which will allow to boost non-Mandarin contents. --Anatoli (обсудить/вклад) 00:57, 31 March 2014 (UTC)[reply]

Whinge: AbuseFilter #24[edit]

This isn't suggesting a change or anything, just a wee venting.

Rule 24 seems, to me, to violate WT:NOT #7. Which is mostly annoying to me because a collaborator on a current project was unable to leave me in-line notes. - Amgine/ t·e 17:17, 30 March 2014 (UTC)[reply]

A lot of people have complained about this in the past. But I don't see how it violates WT:NOT #7, since admins can edit these pages and will delete any that violate this rule. --WikiTiki89 17:46, 30 March 2014 (UTC)[reply]
Hosting a page only a specific user (or an admin) may edit seems to me to be webhosting, and not wiki. Just a difference in interpretation. - Amgine/ t·e 17:59, 30 March 2014 (UTC)[reply]
I think it really depends on the content. If it's just vanity, then I agree. If it's lists of pages they want to edit or test versions of templates, then it's not just web hosting. The reason for this filter is that user pages seem to be a big target for personalized vandalism. So I think we can reduce the restriction to non-autoconfirmed users. --WikiTiki89 18:06, 30 March 2014 (UTC)[reply]
Like several other abuse filters, this one would benefit from the ability to check if users are in a certain group (e.g. autoconfirmed users) on any wiki, not just on this one. - -sche (discuss) 19:29, 30 March 2014 (UTC)[reply]
IMO, if this filter is applied not only to [[user:Username]] but also to [[user:Username/subpage]], then it should log but not prevent the edit, so as to allow, as Amgine says, collaboration on projects, which user subpages have often been used for.​—msh210 (talk) 23:09, 31 March 2014 (UTC)[reply]

Wikimania[edit]

I just noticed that Wikimania (London, 6-10 August 2014) accepts submissions only until tonight (2014-03-31).

Do any of you plan to attend, and do you think a Wiktionary session is possible (it would have to be proposed quickly)?

I believe the scholarships offered by the Wikimedia Foundation are closed, but it may be possible to ask your local chapter for help if you want to attend.

Otherwise, do you think it would be a good idea to plan a Wiktionary conference, "just for us", all languages together? Dakdada (talk) 14:42, 31 March 2014 (UTC)[reply]

I have no opinions regarding Wikimania as I cannot attend. However, I would love for Wiktionary to have an all language project meeting, and would like a discussion there regarding harmonizing interfaces and content across the project. - Amgine/ t·e 15:57, 31 March 2014 (UTC)[reply]

Automatizing {{rhymes}} with Lua, merging it into {{IPA}}, and reforms in pronunciation section[edit]

Automatizing {{rhymes}} looks feasible. The template that takes IPA pronunciation can generate the title of the corresponding rhymes page (at least for English, I don't know about rules of rhymes in other languages). Any opinions/objections about automatizing it?

On the other hand, the structure of pronunciation sections of Wiktionary looks quite stupid and needs some reform:

Note how things such as the word "Audio" and the accent labels ("UK", "US") and the symbols (ɔːtə(ɹ), ɒtə(ɹ)) are being repeated. Maybe it should be something like:

Also: /ˈwɔːtə(ɹ)/ (rhymes) invalid IPA characters (//)

(Template:IPA is supposed to automatically create the wikilink)

Any opinions/objections about this change for linking to rhymes namespace? --Z 16:36, 31 March 2014 (UTC)[reply]

How would you automate it? —CodeCat 16:40, 31 March 2014 (UTC)[reply]
By finding the stress mark, skipping consonant symbols, and checking if the corresponding page for the remaining strings exists. --Z 16:49, 31 March 2014 (UTC)[reply]
What about languages that don't form rhymes in that way? —CodeCat 16:51, 31 March 2014 (UTC)[reply]
We can start from English, and add rhymes manually for languages with different rules until they are automatized, too. --Z 17:26, 31 March 2014 (UTC)[reply]
If the audios are moved to the same lines as the transcriptions, I prefer a tabular layout, otherwise they look like they are floating around.
As for automatising rhymes, it is a good idea, but I don’t think it should be enabled by default in the {{IPA}} template (not even for just English). — Ungoliant (falai) 16:54, 31 March 2014 (UTC)[reply]
Could you elaborate on the automatising part (for English)? --Z 17:35, 31 March 2014 (UTC)[reply]
I don’t think that the template {{IPA}} should automatically “rhymify” its parameters once the module is implemented. Either a new template should be created for rhymified IPA transcriptions, or a new parameter should be added to {{IPA}} to enable rhymification (i.e.: {{IPA|/ˈfuː/|rhyme=yes}}). — Ungoliant (falai) 17:45, 31 March 2014 (UTC)[reply]
Or {{IPA|/ˈfuː/|rhyme=en}}, to remain open. Dakdada (talk) 17:47, 31 March 2014 (UTC)[reply]
I see, but I meant why do you think it should not automatically rhymify? --Z 17:49, 31 March 2014 (UTC)[reply]
Maybe it should eventually, but I think it should first be made non-automatic so that we can test it. After several months of that, I think we could consider making it automatic. --WikiTiki89 18:03, 31 March 2014 (UTC)[reply]
It will rhymify IPAs whose author never intended for use as a rhyme appendix index, which is a problem because rhyme page names are much more standardised than IPA transcriptions. — Ungoliant (falai) 18:08, 31 March 2014 (UTC)[reply]
The main problem is that the English rhymes pages are currently normalized for one regional variant, so that there will be duplicate rhyme pages for different regional realizations of the same phoneme. This highlights another issue: do we want to create a rhyme page for every variant that anyone ever puts in an IPA template? Some people mark things like aspirated and unreleased stops, or represent an artificial-sounding spelling pronunciation, while others just don't understand IPA and make mistakes. Automation means we have to fix or delete a rhyme page every time we correct the IPA. Chuck Entz (talk) 18:30, 31 March 2014 (UTC)[reply]
Ok, automated linking rhymes to looks to be problematic. We can still merge {{rhymes}} into {{IPA}}: the input would be {{IPA|/ˈwɑtɚ/|rhymes=ɒtə(ɹ)}}, {{IPA|/ˈwɑtɚ/|rhymes=yes}} (semi-automatic), or {{IPA|/ˈw[[-ɒtə(ɹ)|ɑtɚ]]/}} (the template can make the page title using the tools that are already developed in Module:links) --Z 19:31, 31 March 2014 (UTC)[reply]
(Re Chuck Entz 18:30, 31 March 2014 (UTC).) We could do it only for IPA marked with /…/ and not for that marked with […]. (I don't know that it's a good idea anyway, though.)​—msh210 (talk) 23:12, 31 March 2014 (UTC)[reply]

Über-template with tabular output for pronunciation section[edit]

This is inspired by opinions from Vahagn Petrosyan and Ungoliant MMDCCLXIV.

We can create a single, Lua-powered template for pronunciation section, which would generate a table similar to that of below. This template would be able to automatically generate IPA pronunciation for certain languages (for now, only ready for Armenian, Georgian, Polish, Standard Chinese (Mandarin), Ukrainian and to some extent for Persian and Ancient Greek), possibly rhymes (discussion at above), and hyphenation (for American English, etc.) wherever possible. (these are feasible from technical aspect)

Both output and input of the proposed template are more readable and the output doesn't have duplications that can be seen in the current format. It also takes and shows information more precisely (in the current format of our pronunciation sections, it is often unclear which, say, audio, or which homophone corresponds to which IPA pronunciation when there's more than one phonemic pronunciation for a given accent, which is the case for "US" here):


output
IPA enPR Audio Hyphenation Homophones
Australia /ˈwoːtə(ɹ)/ [ˈwoːɾə(ɹ)] invalid IPA characters (//[])
UK /ˈwɔːtə(ɹ)/
(file)
wa‧ter whatever!
US /ˈwɔtɚ/ [ˈwɔɾɚ] invalid IPA characters (//[]) wôtər
(file)
wa‧ter
/ˈwɑtɚ/ [ˈwɑɾɚ] invalid IPA characters (//[]) wŏtər


or:

IPA enPR Audio Hyphenation Homophones
Australia /ˈwoːtə(ɹ)/ [ˈwoːɾə(ɹ)] invalid IPA characters (//[])
UK /ˈwɔːtə(ɹ)/
(file)
wa‧ter whatever!
US /ˈwɔtɚ/ [ˈwɔɾɚ] invalid IPA characters (//[]) wôtər
(file)
wa‧ter
/ˈwɑtɚ/ [ˈwɑɾɚ] invalid IPA characters (//[]) wŏtər


input would be:
{{pronunciation|lang=en
|a1= AU
|AU-IPA= /ˈwoːtə(ɹ)/ [ˈwoːɾə(ɹ)]

|a2= UK
|UK-IPA= /ˈwɔːtə(ɹ)/
|UK-audio= En-uk-water.ogg
|UK-hyphen= wa-ter
|UK-homo= whatever!

|a3= US
|US-IPA= /ˈwɔtɚ/ [ˈwɔɾɚ]
|US-enPR= wôtər
|US-audio= en-us-water.ogg
|US-IPA2= /ˈwɑtɚ/ [ˈwɑɾɚ]
|US-enPR2= wŏtər
}}




Its equivalent in our current format is:

output
input
* {{a|Australia}} {{IPA|/ˈwoːtə(ɹ)/|[ˈwoːɾə(ɹ)]|lang=en}}
* {{a|UK}} {{IPA|/ˈwɔːtə(ɹ)/|lang=en}}
* {{a|US}} {{IPA|/ˈwɔtɚ/|/ˈwɑtɚ/|[ˈwɔɾɚ]|[ˈwɑɾɚ]|lang=en}}, {{enPR|wôtər|wŏtər}}
* {{audio|En-uk-water.ogg|Audio (UK)|lang=en}}
* {{audio|en-us-water.ogg|Audio (US)|lang=en}}
* {{hyphenation|wa|ter|lang=en}} (UK, US)
* {{rhymes|ɔːtə(ɹ)|ɒtə(ɹ)|lang=en}}

--Z 20:54, 31 March 2014 (UTC)[reply]

Whole-heartedly support. This is so much nicer than what we have! There are a few things to work out, though. Some of our languages use an diaphonemic approach to IPA. Dutch is one example; there is a single diaphonemic transcription in IPA for all "standard" Dutch varieties, and then differences are noted below that. —CodeCat 21:01, 31 March 2014 (UTC)[reply]
I can only think of two solution, listed here. The 2nd one looks better, but harder to implement. --Z 14:52, 1 April 2014 (UTC)[reply]
Support with both hands and feet. --Vahag (talk) 21:05, 31 March 2014 (UTC)[reply]
Support. But can the homophones parameter be something other than homo? — Ungoliant (falai) 21:09, 31 March 2014 (UTC)[reply]
How about just hom? - -sche (discuss) 00:38, 1 April 2014 (UTC)[reply]
Support Provided there's ongoing support. --Anatoli (обсудить/вклад) 22:28, 31 March 2014 (UTC)[reply]
Would we really need the a1 and similar parameters? Can't Lua infer them from the other parameters' names?​—msh210 (talk) 23:21, 31 March 2014 (UTC)[reply]
Can we please move pronunciation below the definitions, then? As it is, people have long mentioned that they can't find our definitions. Putting this above the definitions would hide them even more. (Of course, putting pronunciation below the definitions requires a vote, whereas this doesn't.) Oppose this template, even though I like the idea and will support its use if the section is moved down.)​—msh210 (talk) 23:21, 31 March 2014 (UTC)[reply]
Indeed, that's my ultimate goal, to highlight important information by getting rid of or marginalizing not-so-important ones (X-SAMPA, rhymes, probably hyphenation). By packing data in a table, it will be much easier for newcomer's eyes to skip the information. We can also make the table expandable, like translations. --Z 14:27, 1 April 2014 (UTC)[reply]
That's a very good point (that, even though the box catches the eye more, it's also easier to skip over and know where to continue looking).​—msh210 (talk) 23:43, 2 April 2014 (UTC)[reply]
  • Support the general idea, but I don't think linking the pronunciation straight to the rhymes is a good idea, for a few reasons. First, because it's not at all clear to the reader that it should link to rhymes, as it doesn't say "rhymes" anywhere in the box. Second, because this eliminates the possibility of adding helpful tooltips indicating what sounds each symbol is for. (I think this would be quite helpful, as many readers don't know IPA. I set up something like this for Wikipedia at w:Module:IPAc, but it never got used.) Third, because this way rhymes can't be shown at all if the full pronunciation details aren't available. In many cases, a user will be able to add that one word rhymes with another word just be using the rhymes adding tool, even if they don't know IPA and thus wouldn't be able to add the full IPA pronunciation. --Yair rand (talk) 20:28, 1 April 2014 (UTC)[reply]
    (Imported Module:IPAc. --Yair rand (talk) 02:08, 3 April 2014 (UTC))[reply]
Umm....I don't know if this is at all relevant, but I just created a new Ancient Greek pronunciation template, {{grc-pron}}, which can be seen in action at εὐδοκέω (eudokéō). If this template is to become the community standard, then I imagine the results from grc-pron could be piped into it. I also assume that the bits not relevant to a language, like rhymes aren't relevant to Ancient Greek, for example, could be turned off. Finally, I haven't finished the JS for this yet, but I was planning on making my new template collapse into one line by default, showing a general gist of the word's phonological evolution, which would be expandable to the current five lines of detail. I would strongly prefer to keep the ability to show the general user a fairly compact, simple pronunciation scheme, while allowing the more serious phonophile more detail. I feel fairly confident that most users don't give a rat shit about most of this info, and assaulting their above-the-line load-screen with it is unacceptable. Again, I don't know if this is relevant, and I do apologize if my grc blinders have caused me to insert myself somewhere I don't belong. -Atelaes λάλει ἐμοί 08:13, 3 April 2014 (UTC)[reply]

Implementation[edit]

There are a few things that need to be sorted out, as the parameters given above are a nice ideal, but there are some practical problems.

  • Hyphenation is not actually a pronunciation thing, so it probably doesn't belong in the pronunciation section, let alone in this table. It's also redundant to list it for each pronunciation as it's going to be pronunciation-independent.
  • The suggested way of specifying parameters doesn't say anything about the order in which the columns should be shown. While you can specify parameters in a different order on calling the template, this information is lost in the translation to Lua or the template. So the module would have no way of knowing that IPA goes before audio, unless we hard-code it into the module itself. That, of course, means that we'd have to maintain a list of valid columns, and reject any that are not recognised. It's not ideal.
  • Consider also the changes that are in progress with Chinese entries. They also use a table structure but it expands. See for example User:Wyang/告白. It may not be the best thing to have one format for Chinese and another for all other languages. So we may want to consider a way to accommodate both.
  • What should go in the first column when there is only one pronunciation?

CodeCat 21:46, 31 March 2014 (UTC)[reply]

IMO the column names and their order should be hard-coded. There are only a few, and keeping the order consistent will be good for those who view multiple pages, with no drawback I can think of.​—msh210 (talk) 23:21, 31 March 2014 (UTC)[reply]
American hyphenation is based on pronunciation. British hyphenation is based on etymology. We could put the American hyphenations in the pronunciation section, and the British hyphenations under etymology (tongue-in-cheek). —Stephen (Talk) 04:10, 2 April 2014 (UTC)[reply]
I've written a module, Module:User:CodeCat/pronunciation, which can be seen (temporarily) on User:CodeCat/sandbox. It's a quick draft, but it does work more or less. The data is hard coded for now, I want to make a separate module (named /templates) which handles calling from templates. That way, the current module is focused only on Lua, and doesn't need to concern itself with details of how it's invoked. —CodeCat 23:02, 31 March 2014 (UTC)[reply]

It looks good. The plain version is better, because the table lines and tone are just visual distraction. It would be more readable if every header and body cell were left-aligned.

If there is only one line and no accent, can the first column be omitted?

I agree that hyphenation doesn’t really belong here. There are, apparently, different American and British hyphenation rules, and probably rule sets according to specific style guides. Michael Z. 2014-03-31 23:15 z

If we're putting homophones in the template we should decide what it looks like if you have a lot- vertical list? expandable? DTLHS (talk) 23:49, 31 March 2014 (UTC)[reply]

Like many languages, English has a very large number of dialects. At present, people can add information on these dialects' pronunciations of words rather simply: they just copy a line of the current format and change the {{a}}, expanding
* {{a|GenAm}} {{IPA|/maɪt /|lang=en}}
* {{a|Southern US}} {{IPA|/mɑːt/|lang=en}}
to e.g.
* {{a|GenAm}} {{IPA|/maɪt /|lang=en}}
* {{a|Southern US}} {{IPA|/mɑːt/|lang=en}}
* {{a|Old Virginia accent}} {{IPA|/mæːt/|lang=en}}
and {{a}} is flexible enough to handle this. Your proposed format seems to use dialects' names in the names of parameters ("UK-IPA=", "UK-audio=", etc). Does this mean we would have to anticipate and code into the template/module, in advance, every dialect of every language that the template covered, or people would not be able to add those dialects' pronunciations of words? - -sche (discuss) 01:29, 1 April 2014 (UTC)[reply]
Lua is capable of iterating over all parameters. So it's able to "find" parameters whose name is not known in advance. It does require that the parameters follow some predictable pattern so that the module knows which part of the name it should interpret as a dialect name. —CodeCat 01:55, 1 April 2014 (UTC)[reply]
Yes. But there is also a nice solution to avoid problems such as following restricted patterns and long names for parameters: (for 2nd row)
|2= US
|2-IPA= ...
|2-audio= ...
this way, we'd be able to put more than one dialect/variety in a single row: by passing them to a numbered parameter, separated by comma or something: |2=UK, US. We can also define subvarieties which would be placed in the second column, by passing them to |n-n= where n's are numbers, which is especially useful in Chinese entries.
Regarding columns, and whether they should be hard-coded, I think specific columns should be defined with a default order, but we should probably be allowed to override it by defining new columns and their order using special parameters, like |columnn=.
If the |n= parameters are not specified (usually when we have only one row), the first column should be omitted.
I think hyphenation belong somewhere in PoS section, near headword line where we show the word, maybe floating on the right side, like this.
@Wyang you may be interested in this discussion. --Z 14:09, 1 April 2014 (UTC)[reply]
I support the idea of using a single template to cover all the English pronunciation information. In the case of English, it might be useful to pre-define some commonly used dialectal names, such as 'UK', 'RP', 'US', 'GenAm', 'Canada', 'Australia', 'Ireland', 'New Zealand' and 'South Africa', so that one can use
{{pronunciation|lang=en
|UK-IPA= ...
}}
without having to identify the variety in a separate parameter. The case of Chinese is a little different from English, as the internal divisions of Chinese are well-defined from centuries of Chinese dialectology (Mandarin, Cantonese, Wu, etc.), with each division having a prestige dialect (usually the dialect of the largest city), whereas English accents are less well-defined. Regional Chinese accents are associated with locations (List of varieties of Chinese, eg. Beijing, Guangzhou), while English accents tend to be characterised by region/area (eg. Southern US). Hence, I would prefer if {{zh-pron}} has parameters 'm=PINYIN', 'c=JYUTPING', 'w=ROMANISATION' for prestige dialects, and 'm-LOCATION_A=IPA', 'm-LOCATION_B=IPA' for other varieties. In contrast, the English template is probably better without a set of predefined divisions (as the example above |1= ... |2= ...), but with some predefined commonly used parameters, as said above. As for rhymes, I think they should be kept separate from the IPA template. Wyang (talk) 03:54, 2 April 2014 (UTC)[reply]

Changes to the default site typography coming soon[edit]

This week, the typography on Wikimedia sites will be updated for all readers and editors who use the default "Vector" skin. This change will involve new serif fonts for some headings, small tweaks to body content fonts, text size, text color, and spacing between elements. The schedule is:

  • April 1st: non-Wikipedia projects will see this change live
  • April 3rd: Wikipedias will see this change live

This change is very similar to the "Typography Update" Beta Feature that has been available on Wikimedia projects since November 2013. After several rounds of testing and with feedback from the community, this Beta Feature will be disabled and successful aspects enabled in the default site appearance. Users who are logged in may still choose to use another skin, or alter their personal CSS, if they prefer a different appearance. Local common CSS styles will also apply as normal, for issues with local styles and scripts that impact all users.

For more information:

-- Steven Walling (Product Manager) on behalf of the Wikimedia Foundation's User Experience Design team

Well, this is different... —CodeCat 18:41, 1 April 2014 (UTC)[reply]
What timezone is WMF? East Coast US? I want to know when April 1st is over because this looks hideous. Neitrāls vārds (talk) 19:28, 1 April 2014 (UTC)[reply]
Are you telling me this isn't just a subtly maddening April Fools' joke? Ultimateria (talk) 20:38, 1 April 2014 (UTC)[reply]
Neitrāls vārds, we're spread across timezones. :) The public deployment calendar has times in both UTC and Pacific, where the office is. And nope, Ultimateria, not a joke. If anyone is curious about how to go back to the old version for themselves, or why we did this, the FAQ and other materials has way more detail than I could provide here. Steven (WMF) (talk) 21:46, 3 April 2014 (UTC)[reply]