Wiktionary:Beer parlour/2019/December

Transitive / répondre[edit]

The entry say "transitive with à" and "transitive with de", and there's a template for this, that's why I'm raising it here. Our definitions of "transitive" and "transitive verb", which I think are the correct ones, say "taking a direct object". So neither of these is transitive to me. With "à" I still get it a little, because in French you have a distinction between "à" that is always prepositional and "à" that can be replaced by dative pronouns. But with "de"? 90.186.72.105 12:38, 3 December 2019 (UTC)[reply]

Not answering your question (I'll come back to it later), but see indirect transitive / indirectly transitive and prepositional transitive / prepositionally transitive. Canonicalization (talk) 17:32, 3 December 2019 (UTC)[reply]

Phrasal verbs with specific subjects[edit]

What part of speech are phrases in which the phrase is fundamentally a verb, but with a fixed subject, and how to lemmatize them? The particular case in my mind is Finnish hymy hyytyy, which is the verb hyytyä, but where hymy is always the subject of that verb; let's say for now that it means something like "smile will stop" (even though that is not the literal meaning here, just a very rough translation).

It's used with a meaning fairly similar to laugh on the other side of one's face; the "subject" for that English equivalent would be expressed as a genitive/possessive construction for hymy, whether as a possessive suffix or a genitive before the word, so to say something like "he will laugh on the other side on his face", you would say "his smile will stop" ("hänen hymy(nsä) hyytyy"). The subject for the verb is still hymy, and it has to be, since it is part of the phrase. (To further complicate this, Finnish verbs are lemmatized with the first infinitive, but you cannot use infinitive forms with verb phrases that have a subject, like this). — sur jec tion ⟨?⟩ 17:24, 3 December 2019 (UTC)[reply]

There is a somewhat similar issue for the idiom of the tail wagging the dog. We have an entry for the noun tail wagging the dog, but you can also say “the tail has wagged the dog”, “the tail wagged the dog”, “the tail did not wag the dog”, “the tail will wag the dog”, “the dog that was wagged by the tail”, and other variations. Then there is the phrase the rubber hits the road, where you can also use an infinitive, “let the rubber hit the road”, or a simple past, “when the rubber hit the road”. This does not resolve the issue, but only shows it is more general; I bet that there are scores of other similarly flexible idioms. Using the last example as a paradigm, you could simply classify hymy hyytyy as a phrase and add this as a related term to the entry for hymy. --Lambiam 21:41, 3 December 2019 (UTC)[reply]

Right, I thought there were no English parallels, but it turns out there are: I also found one's blood runs cold. I'll just use the one you proposed for the lemma. Thank you! — sur jec tion ⟨?⟩ 08:05, 4 December 2019 (UTC)[reply]

Even there is is a parallel, and one that we had great difficulty deciding on a proper lemma for. —Mahāgaja · talk 09:54, 4 December 2019 (UTC)[reply]

Inconsistent use of colons in entry titles[edit]

There are three different colon symbols which are used in entry titles:

Character	Unicode block	Ideal use	Mainspace	Reconstruction
: (`U+003A`)	Basic Latin	Punctuation in Swedish/Finnish/Danish	38 entries [1]	3 entries [2]
ː (`U+02D0`)	Spacing Modifier Letters	IPA symbol for vowel length	66 entries [3]	133 entries [4]
꞉ (`U+A789`)	Latin Extended-D	Tone letter for some languages in Africa/Papua New Guinea	78 entries [5]	7 entries [6]

Note: The number of search results retrieved by the server tends to be inconsistent.

Currently, up to 108 Oroqen lemmas [7] (a Tungusic language written with IPA notation) are using ꞉ (U+A789, tone letter) instead of ː (U+02D0, IPA vowel length)

The Wiktionary entry at ꞉ claims that U+A789 (Latin Extended-D) is used in many languages’ orthographies to denote a long vowel. However, this contradicts the recommendation by Unicode [8] (search for A789):

used as a tone letter in some orthographies
Budu (Congo), Sabaot (Kenya), and several Papua New Guinea languages

Is the current definition at ꞉ (U+A789) added in Feb 2012 incorrect?

Shall we standardize the usage of colons by moving the page title to the correct symbol (after checking whether the colon is used for vowel length or as a tone letter)?

Also, do we need redirects from : (U+003A, standard colon) to the entries containing one of those two uncommon colons in the title, or shall we just delete the old pages containing the standard colon? KevinUp (talk) 10:12, 6 December 2019 (UTC)[reply]

Thoughts about the latest community wishlist[edit]

Hi everyone, maybe you could be interested participating to this discussion about the community wishlist and how to improve the process. Do not hesitate to give your opinion; the more we will know about the small communities, the more we can build something representative. Pamputt (talk) 17:41, 6 December 2019 (UTC)[reply]

Alternative scripts[edit]

I am thinking of having this feature in the entries of such languages as Latin and Old English. What if we had an 'alternative scripts'- box (as we have for Sanskrit) in which the alternative orthographies for a word can be shown? In case of Latin, the Roman square capitals are to be employed to show the alternative orthography, with the other ancient conventions (e.g., of writing V for U, engraved dots for separating the words in a phrase entry). In like wise, in case of Old English, the Anglo-Saxon runes are to be employed to show the alternative orthography. This is meant to be for educational purpose, just as how we show alternative scripts in our Sanskrit and Pali entries. I hope people would accept this proposal. —Lbdñk (talk) 18:33, 6 December 2019 (UTC)[reply]

I would like to know what opinion @Mnemosientje, Mahagaja, Leasnam are of? —Lbdñk (talk) 17:07, 7 December 2019 (UTC)[reply]

For Latin it seems unnecessary to me; equus and EQVVS aren't really different scripts after all, just different interpretations of the same script. For Old English, we should only include Runic spellings that are actually attested, because the Runic spellings are not automatically predictable from the Latin spellings. —Mahāgaja · talk 18:57, 7 December 2019 (UTC)[reply]

👎. If anyone wants that, they can just do span[lang="la"], span[lang="la"] a { text-transform: uppercase } in their common.css. --{{victar|talk}} 19:07, 7 December 2019 (UTC)[reply]

Or that combined with JavaScript to replace U with V and J with I, and Ī with ꟾ, and other macrons with acutes to represent apices. An interesting idea actually. See User:Erutuon/scripts/epigraphicLatin.js. — Eru·tuon 19:12, 7 December 2019 (UTC)[reply]

LOL, @Erutuon, you would put that together. --{{victar|talk}} 19:34, 7 December 2019 (UTC)[reply]

How do you search that? I recall having quite some trouble with thieuish once, even knowing the orthography, and having this is plain text the search could find would have been helpful.--Prosfilaes (talk) 04:41, 11 December 2019 (UTC)[reply]

I don't know. The JavaScript won't affect the search engine. To find thievish when searching "thieuish", "thieuish" has to be somewhere in the HTML output by the parser, before JavaScript has acted on it and modified it. So having "thieuish" in the alternative forms section of thievish would of course work. Or perhaps the search engine can be taught to understand obsolete English spelling systems, in which u is used in place of v. — Eru·tuon 06:03, 11 December 2019 (UTC)[reply]

I kind of like the idea, but I'm hesitant to support it, just because of the ambiguities in what constitutes a separate script versus a different style of writing the same script. Would we include Carolingian miniscule for Latin? How about Merovingian cursive? Or what about cursive for modern languages? There doesn't seem to be anything particularly special about classical Latin script as compared to Medieval forms of writing, so I see it being a bit difficult to come up with a reasonable policy. Andrew Sheedy (talk) 19:54, 7 December 2019 (UTC)[reply]

Quotations and usage examples for alternative forms[edit]

In the case where X is defined only as "Alternative form/spelling of Y", quotations and usage examples for X should be listed at Y, interspersed with those for Y, right? Otherwise, when (as is very often the case) there are multiple senses of X, there is no way to associate a particular example or quotation with a particular sense, right? Mihia (talk) 18:57, 6 December 2019 (UTC)[reply]

There is a data-shortage problem and the problem of both definitions and forms being born and dying. In principle we would like to attest to each spelling being used for each definition, but there is rarely enough attestation for that. Older alternative spellings are rarely found for newer definitions. What's worse, sometimes new spellings become specialized for some new meanings. And sometimes there is not enough attestation with any one spelling for a definition.

I don't think that we should have a (binding) policy for all (any?) aspects of this. It would be nice to give some context and date information for alternative spellings. I suppose we need to allow alternative forms to support hard-to-attest definitions. But we also need to be willing to provide attestation for alternative spellings and that attestation belongs with the alternative spellings, IMO. It would also be nice to demonstrate for which meanings the alternative spelling seemed to be used, even if we cannot have each definition of the alt. spelling meet RfV. DCDuring (talk) 19:24, 6 December 2019 (UTC)[reply]

(edt conflict) :This is not a hard rule. Quoting from Wiktionary:Quotations#Naming: “Inflected forms and alternative spellings can be cited as such, especially if their existence is in doubt, but it may be useful to gather their citations on the lemma's page.” The point of the quotations at every where is not to establish the sense, which does not differ from everywhere, but solely to attest that this spelling has been used. So, rather in general, a quotation/citation/usage example should go where it is the most helpful, all considered. Sometimes that will be at the alternative entry and sometimes at the lemma. --Lambiam 19:34, 6 December 2019 (UTC)[reply]

Wild, I was thinking about this earlier. I was thinking that adding a quote to Y would be more visible, but perhaps it should be added to X if its only purpose is to prove the existence of X, and is otherwise a very dull quote. —Suzukaze-c ◇◇ 19:37, 6 December 2019 (UTC)[reply]

So in the case where X and Y have, let's say, half a dozen distinct senses, and it is deemed that examples/quotations for X would be more helpful at X not Y, then how should this physically be laid out at X? Mihia (talk) 20:19, 6 December 2019 (UTC)[reply]

It depends on what one wants to show with the quote. If it is the existence (or still-existence) of usage, then people put it at the respective spelling. Usually the quote is to illustrate the meaning of the word or in other words to show the extents of the usage, hence you put it by the sense. It’s not about “half a dozen of senses”, a single difficult one also does it, as well as any usual sense, because it’s the default, else it is too complicated. If someone doubts the existence of an alternative form, he has to look at the main-form first, it may be that there is an alternative form quote. Otherwise people would have to click on alternative forms to possibly find example quotes, which is against the greater good, and identity is perceived anyway and expected to be expressed, as an alternative spelling or form does not necessarily make a different word, if one counts words (hint: what do you do if you attest a term from durable audio records, but the spelling is unreliable as the term is not spelled in print in this case, and not usually transmitted by print either? You just put them at some main form, no other way; excluding audio-only words dogmatically is not a preferrable solution). The difference between lemma and inflection is also blurry anyway. Sometimes it looks good to collect quotes for both imperfective and perfective forms of a Slavic verb on one page – even though both are lexicalized, they still share meaning, and if the development is not detached, then it can well be one page. Then ideally plurals of Arabic words need to be quoted but it is doubtful that they are sought at the respective plural pages; to a lesser extent the same with verbal nouns. Perhaps Wiktionary needs to begin to jam quotes into templates often to reuse them at multiple pages, but to do it only for theses cases would invite to do it also for other cases where a quote is reused for multiple unrelated terms and then we need a method to still highlight 🥴. This is all a sign of MediaWiki not being the tailored software for dictionaries. Fay Freak (talk) 21:45, 6 December 2019 (UTC)[reply]

Thanks for your reply, but I think you have not really understood the purpose of my question. Mihia (talk) 21:56, 6 December 2019 (UTC)[reply]

No, I think I have. You want a more easy guide than there can be in a confusing world; you have not understood the extent and implications of your own question. Unsurprising, since most people do not understand the rules. Like in RL most people do not understand the law but need an attorney, most fundementally lack understanding of the CFI.

In this context, since this understanding mitigates the requirement of attestation of each form, what is also to be noted is that as long as variants are perceived as one word, not every variant needs to be attested three times. The CFI say only a term needs to be, but if variants are the same term then there can be two attestation for one form and one for a variant, or one-one-one, it would be absurd anyhow to not include a term because it is found but once written together, once hyphenated, once spaced, and it also does not need to be written at all. The most widespread understanding of the CFI has been wrong.

This of course notwithstanding that the widespread application of the CFI is contra legem as it disregards its telos of only excluding ghostwords and protologisms, and durable attestation is not necessary, just sufficient, that only being a formal guideline and other demonstration of likely that someone would run across it and want to know what it means being possible, so that one can collect all words in all languages.

Just let it sink in, guys, you won’t agree with the facts without losing face thus abruptly after thus persistent practice against the rules. No, it is not recognized either that the practice of Wiktionary could derogate the written rules. Fay Freak (talk) 22:12, 6 December 2019 (UTC)[reply]

The question was, “[given certain conditions], how should this physically be laid out at [the targeted form]?” Understood or not, your reply did not appear to be in response to this question. --Lambiam 14:27, 7 December 2019 (UTC)[reply]

Let me try to explain the background to my layout question in a little more detail, in case it is unclear. The manual duplication of definitions for trivial spelling variants of what is essentially exactly the same word is highly undesirable in my opinion. It is a constant maintenance headache and an almost inevitable recipe for definitions getting randomly out of sync where they should be the same, potentially giving readers the impression that there is some difference in meaning or usage where there is none. Replacing the individual definitions at X with "Alternative spelling of Y" solves this at a stroke, but it introduces a new problem of where to put usage examples for spelling X. If they go at X then either they will be all lumped together with senses undifferentiated, or the definitions will have to be duplicated at X, which is exactly what we are trying to avoid in the first place. Another problem with the "Alternative spelling of Y" method is that it seems to give one form precedence. In some cases this may be desirable. In cases of "equal" spellings, which includes many US/UK variants, it is not desirable. In the latter cases I advocate a heading "X or Y", with X and Y both pointing there. If, exceptionally, a certain sense is possible only with one spelling, this can easily be handled with a sense-specific label. Unfortunately, my guess is that the "X or Y" format might break other things, that would then have to adapted for the new format. Mihia (talk) 15:12, 7 December 2019 (UTC)[reply]
Simply stated, put your cites/quotes where the senses are ("Yes" to your first question). So if X is pointing to Y where the senses are itemized, put your quotes on Y. If you have an X with some senses particular to X in addition to those on Y, put the quotes on X if those senses are itemized there. I don't see the need to put examples on X if you are only trying to prove that X really existed. One can see those examples on Y where X will be listed as an alternate form. The point of the soft redirects is just to get people to the right page where all of the data should be fully documented. Or that is how I see it. -Mike (talk) 22:03, 18 December 2019 (UTC)[reply]

I agree -- or, at least, I cannot see any better solution given the present capabilities of the wiki system. Where quotations are sorted in date order, would you sort everything together, thus interspersing the various spellings, or would you create separate sorted sections for different spellings? Mihia (talk) 21:59, 19 December 2019 (UTC)[reply]

Additional relevant discussion: Wiktionary:Tea_room/2020/January#Where_to_put_citations_of_variant_forms. - -sche (discuss) 21:06, 2 January 2020 (UTC)[reply]

Logograms[edit]

I set up a vote about Logogram being a valid POS header. Hope we can decide on something. --Vealhurl (talk) 01:46, 7 December 2019 (UTC)[reply]

Codes for Proto-West-Germanic, Proto-North-Sea-Germanic, Proto-Anglo-Frisian, and/or Proto-Northwest-Germanic[edit]

(Notifying Leasnam, Lambiam, Urszag, Hundwine): @Rua, Mnemosientje User:Hundwine has added a lot of etymologies with reconstructions from the various intermediate stages between Old English and Proto-Germanic. Sometimes these are just showing the sound changes between Old English and Proto-Germanic, which may or may not be useful, but sometimes these are for forms that are reconstructible back only to one of these intermediate protolanguages. Do people think it's useful to create etymology-only language codes for some of these protolanguages? Benwing2 (talk) 21:35, 7 December 2019 (UTC)[reply]

I think they are interesting, but I'm not certain that the etymology sections are the right place for this information. Perhaps Wiktionary:About_Anglo-Frisian would be a better place for this information, as we can detail the expected sound rules here once, instead of multiple times on multiple pages. Leasnam (talk) 21:55, 7 December 2019 (UTC)[reply]

No, we can’t really imagine how the dialectal landscape on the continent was from 500 BCE to 800 CE. The divisions vary. Was it really Proto-North-Sea-Germanic or just North-Sea-Germanic? I also have my doubts about the alleged Category:Frankish language that virtually only consists of reconstructions. Here it is put as the ancestor of Old Dutch, I would also put it as the ancestor of a part of High German, but after all the second Germanic consonant shift back then might only be an isogloss not splitting languages, and maybe Old High German is kinda the descendant of Old Dutch or Old Low German (like Afrikaans of Dutch), two languages both comparatively poorly attested, probably the same at some point, but how distinct of Elbe Germanic I don’t know. The Germanic loanwords in Slavic would mostly be from Elbe Germanic being Old High German before the High German sound shift but Old High German is not only from Elbe Germanic but also Weser-Rhine Germanic or whatever; often one just says they are from Old Saxon, the earliest that we can catch, like on *redьky (so verbatim Boryś). No, I don’t think these were proto “languages”. Fay Freak (talk) 21:59, 7 December 2019 (UTC)[reply]

@Fay Freak I agree that Proto-North-Sea-Germanic may be questionable but I think that West Germanic (especially) and Northwest Germanic are well-established clades. Ringe for example reconstructs lots of terms only back to Proto-West-Germanic; his criterion for a Proto-Germanic reconstruction is that it must either be attested in both Gothic and a non-Gothic language or it must have a clear PIE antecedent. Otherwise it can be reconstructed only to Proto-Northwest-Germanic (if found in both North and West Germanic) or Proto-West-Germanic only. Benwing2 (talk) 17:13, 8 December 2019 (UTC)[reply]

I definitely oppose all but Proto-West-Germanic. Proto-Northwest-Germanic is almost the same as Proto-Germanic, bar a few sound changes, and we've rejected Proto-Baltic and Proto-Finno-Ugric on those grounds in the past. Proto-West-Germanic may have more of a chance, but even then, I'm not sure if it is clearly reconstructable. I draw attention in particular to the different outcomes of the loss of final -z in northern West Germanic versus southern West Germanic: it was lost everywhere in the north, but retained in monosyllables in the south. Did the loss in multisyllables occur in Proto-West-Germanic and then get extended to monosyllables later on in the north? Or did Proto-West-Germanic retain all cases of final -z? —Rua (mew) 19:34, 8 December 2019 (UTC)[reply]

@Rua I don't have my copy of Ringe with me to see what he says but I would assume the most natural reconstruction would have two stages of loss of final -z, with it lost everywhere but in monosyllables in Proto-West-Germanic and then lost in monosyllables in the north later on. However, my concern is more with vocabulary that's specific to West Germanic. Lots of what we reconstruct for Proto-Germanic is attested only in West Germanic, e.g. *agō, *aiþumaz, *aldiz (“human being”), *ambrijaz, etc. just to pick a few from the a's. *ambrijaz even says it's probably loaned from Latin, which definitely suggests it was not present in Proto-Germanic. Is it correct to reconstruct these terms back to Proto-Germanic? Benwing2 (talk) 18:26, 9 December 2019 (UTC)[reply]

I got myself a copy of the book and I'm reading it now. —Rua (mew) 19:16, 9 December 2019 (UTC)[reply]

A question I have is how we're going to represent the effects of the gemination before *j. Ringe doesn't seem to think it's phonemic in PWG yet, and still writes the ungeminated version. But I think that does a disservice to the descendants, so I would prefer writing them with the gemination: -nnj rather than -nj-. A related matter is the phoneme *z. Writing it as simply z hides the fact that it was close enough to *r (i.e. rhotic) to escape the effects of the gemination, but we also can't write it as *r because several dialect-specific changes rely on the distinction. So what about *ʀ, following the Proto-Norse tradition? —Rua (mew) 23:02, 9 December 2019 (UTC)[reply]

Yet another question is what happens to sequences of labiovelar + *j. According to Ringe, labiovelars dissolve into a velar + *w, but what effect does that have on syllable weight, in the context of Sievers' law? Does *kʷj become *kwij, *kkwj or *kkwij? I can find nothing about this in the book. —Rua (mew) 23:34, 9 December 2019 (UTC)[reply]

@Rua I agree with all your suggestions. I think kw + *j probably becomes kkwj; I would expect it to behave similarly to k + *j, although I can't think of any weak verbs to test this out. Benwing2 (talk) 03:43, 10 December 2019 (UTC)[reply]

Ringe does note that þicce has no palatalisation, and ascribes this to the labial. Sadly, no cases of a labiovelar + j after a short syllable are attested in Gothic to compare outcomes. But a sequence -kkwj-, in which the w is an actual segmental consonant, seems quite weird. PWG would have to be reconstructed with ij after historically long syllables, geminate + j after historically short syllables, and this unique cluster -kkwj- that seems nearly unpronounceable. This also makes me wonder whether Sievers' law was even still maintained in PWG. Ringe says that it was, but how do we tell the difference between -j- and -ij- in the attested descendants? On what grounds does he say that it was maintained? —Rua (mew) 10:22, 10 December 2019 (UTC)[reply]

I've created the code gmw-pro for Proto-West Germanic now, and made one entry for it, *slāpan. I've also changed the descendants of *slēpaną accordingly. There is a reference template for Ringe's book, {{R:gmw:DOE}}. —Rua (mew) 19:03, 10 December 2019 (UTC)[reply]

I don't hate it. I'd appreciate a style guide for PG to PWG. --{{victar|talk}} 19:37, 10 December 2019 (UTC)[reply]

@Victar WT:AGMW has been created now. —Rua (mew) 20:19, 10 December 2019 (UTC)[reply]

Thanks. What of unstressed diphthongs and word-final vowels? --{{victar|talk}} 20:29, 10 December 2019 (UTC)[reply]

Shouldn't a vote had been held? 𐌷𐌻𐌿𐌳𐌰𐍅𐌹𐌲𐍃 𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐌹𐌲𐌲𐍃 (talk) 10:35, 11 December 2019 (UTC)[reply]

Given how many entries are affected by this, some restraint may have been in order; on the other hand, if something is to be done usually it's good to use the momentum of a productive thread instead of letting a good idea (and I'm tending to think PWGmc at least is a good idea, esp. for entries such as those mentioned above and those found in Category:West Proto-Germanic) die as a thread gets eclipsed by new discussions. If someone objects, they can also create a vote of course; I don't think in any case there is any policy stating new language headers can only be created after a vote. — Mnemosientje (t · c) 12:09, 11 December 2019 (UTC)[reply]

Wouldn't the Category:West Proto-Germanic have sufficed? Also do the majority of scholars accept Proto-West-Germanic as a valid proto-language? 𐌷𐌻𐌿𐌳𐌰𐍅𐌹𐌲𐍃 𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐌹𐌲𐌲𐍃 (talk) 16:36, 11 December 2019 (UTC)[reply]

@Holodwig21 Yes, AFAIK they do. Benwing2 (talk) 05:45, 12 December 2019 (UTC)[reply]

Tbh I definitely oppose listing stuff like *kirikǭ as Proto-Germanic (I don't see this as possibly being borrowed before 350 AD or so), even when there's a label "West Germanic". If this solves that, that'd be great. Judging from Google Books, "Proto-West Germanic" is a thing, so we're fine on that front too. — Mnemosientje (t · c) 09:06, 12 December 2019 (UTC)[reply]

I've moved *kirikǭ from Proto-Germanic to Proto-West Germanic. We probably need to update the inflection templates; at the moment, the entries are still using the Proto-Germanic inflection templates. —Mahāgaja · talk 11:47, 12 December 2019 (UTC)[reply]

@Mahagaja I've moved it to *kirikā, with the ending in its PWG form. —Rua (mew) 14:06, 12 December 2019 (UTC)[reply]

If we're going to be placing West Germanic entries all together, I'd like to recommend using {{top3}}, as we do on Proto Iranian and Slavic entries. Please see *badi. It makes the page much easier to read at a glance than a long, unwieldy page. --{{victar|talk}} 20:43, 12 December 2019 (UTC)[reply]
- I oppose the use of columns for descendants sections, as I mentioned before. You shouldn't be using live namespaces as your personal testing/demonstration ground, you could have used your user page space for that. —Rua (mew) 20:46, 12 December 2019 (UTC)[reply]
  - It's absolutely ridiculous that I would be forbidden from creating a live version of a {{top3}} example and an egregious overreach of admin powers on your part to block me for it, but I'm not going to pettily fight over it anymore. --{{victar|talk}} 21:59, 12 December 2019 (UTC)[reply]

You don't have to permanently change the entry for a demonstration. What I've seen some people do is make their edit, undo it, then link to the {{diff}} in the edit history:

{{diff|58155508|text=this is how it looks with top3}}

displays:

this is how it looks with top3

Chuck Entz (talk) 05:08, 13 December 2019 (UTC)[reply]

Not really because it's not a direct link, but thanks. I'm also not interested in hijacking this discussion with this anymore. --{{victar|talk}} 05:29, 13 December 2019 (UTC)[reply]

Due to the creation of Proto-West Germanic, Wiktionary:About Proto-Germanic#Dialectal forms needs revision. — Mnemosientje (t · c) 12:20, 14 December 2019 (UTC)[reply]

Old English "prefixes" that are just parts of compounds[edit]

(Notifying Leasnam, Lambiam, Urszag, Hundwine): There are a lot of entries for Old English prefixes that IMO are nothing of the sort. Examples:

ang- (“narrow”)
car- (“sadness”)
carl- (“male”)
dūne- (“down”)
eald- (“old”)
ealdor- (“origin”)

etc. In my view these are just parts of compounds and should not have separate entries as prefixes. There are tons of potential "prefixes" of this sort, since compounding in Old English was extremely productive (just as in modern English and German). I think we should only have entries for prefixes which either (a) have no corresponding non-bound form, (b) have a non-bound form that is a function word (preposition, etc.), or (c) have a meaning that is fundamentally unpredictable from the non-bound form. Benwing2 (talk) 03:50, 10 December 2019 (UTC)[reply]

Sounds good to me. There are an awful lot of those. We should probably start by orphaning the fake prefixes by replacing etymologies like “ang- +‎ brēost ” (for angbrēost ) by “ange +‎ brēost ”. Then we can submit the lot for deletion. --Lambiam 12:23, 10 December 2019 (UTC)[reply]

I have been thinking about this myself; sorting out the Gothic prefixes from the compounds is a matter I have been wanting to take care of. A lot of Germanic languages have this issue, e.g. last I checked Category:German prefixes had a couple dubious ones too. Perhaps we could have a broader discussion not just restricted to OE. — Mnemosientje (t · c) 12:12, 11 December 2019 (UTC)[reply]

Zhuang dialects are not under the same NT.[edit]

Zhuang has many dialects. It may be roughly divided into groups: Northern Zhuang and Southern Zhuang (they were each individual language). Standard Zhuang is based on Wuming Zhuang, which is under Northern Tai languages, that is our current practice.

According to Pittayaporn (2009), see family tree in Wikipedia, many Zhuang dialects are categorized on other branches. The problem is that some Zhuang dialects are not under the same NT: some are Central Tai, Southwestern Tai, or even new braches. If we only use the "za" code, they will be always categorized under NT that is definitely incorrect. And also related templates will get error in order to display correct information.

Moreover, I discover that the dialects in question significally have different phonemes and different spellings (very far) out of Wuming. For example, some use ts, ˀj, aspirated plosives (ʰ), labialized consonants (ʷ) which za-pron does not support. "Yellow" in Wuming is lieng but SZ has jweng instead, etc. We could edit pronunciation module but the family tree problem still remains unsolved.

So, what I think is, if we want to collect non-Wuming Zhuang vocabulary, it should better separate them each language code (which already has), that will help suitable categorization. --Octahedron80 (talk) 00:39, 11 December 2019 (UTC)[reply]

@Octahedron80: I just stumbled upon this. AFAICT, Zhuang is not currently classified as a Northern Tai language in our modules, but as a Tai language. (See CAT:Zhuang language.) I don't think we have enough resources as of now to separate Zhuang into different languages. Resources like Sawloih Cuengh-Gun (壮汉词汇) and Sawloih Gun-Cuengh often include "dialectal" terms but don't specify which particular variety they are used in. Also, Standard Zhuang is not necessarily equivalent to Wuming Zhuang. I think it would be better to focus on increasing our support of other varieties in the pronunciation modules. — justin(r)leung _{{ (t...) | c=› }} 08:59, 11 January 2020 (UTC)[reply]

Standard Zhuang "Its pronunciation is based on that of the Yongbei Zhuang dialect of Shuangqiao, Guangxi in Wuming District, Guangxi with some influence from Fuliang, also in Wuming District" and Yongbei Zhuang "Yongbei is traditionally assumed to include Wuming dialect, the basis of Standard Zhuang." Standard Zhuang is NT but recently I got Nong Zhuang which is CT instead. Central Tai is another group even it includes some "Zhuang". That means there are a lot of Nong Zhuang words (comparing to Nung and Tay in Vietnam) which are much different from Standard Zhuang. --Octahedron80 (talk) 11:03, 11 January 2020 (UTC)[reply]

@Octahedron80: What is grouped under Zhuang is indeed a very diverse group, but we haven't made the error of classifying "Zhuang" as Northern Tai. The code za is the macrolanguage code, not the code for Standard Zhuang. The way we're treating Zhuang is basically how we're treating the diverse Chinese languages (under the code zh). Are you proposing that we split Zhuang up into its different varieties? From how I see it, Nong Zhuang is different from Standard Zhuang (based on the Wuming dialect) as Min Nan is different from Standard Chinese (i.e. Standard Mandarin, based on the Beijing dialect). Orthography is perhaps where the analogy between Zhuang and Chinese breaks down since Zhuang orthography is more phonemic (and thus subject to more dialectal variation in orthography). Pinging @Suzukaze-c as well. — justin(r)leung _{{ (t...) | c=› }} 00:12, 13 January 2020 (UTC)[reply]

@Johnkn63, do you have any opinions? —Suzukaze-c ◇◇ 00:44, 13 January 2020 (UTC)[reply]

Can we please have a focused approach on one standardised Zhuang accent or one specific variety? Even a slight variation could be require a separate module. I don't think it's a great idea to make this one module to cover multiple accents. Even if it's possible, it's not how other IPA modules are done and will confuse/discourage other contributors.

I support separating into varietis only if we have resources but we don't. Zhuang content at Wiktionary is miserable. Let's focus on just one Zhuang dialect first. --Anatoli T. ^{(обсудить}/^вклад) 01:01, 13 January 2020 (UTC)[reply]

@Atitarev: We do need dialectal support because the Zhuang macrolanguage is divergent like the Chinese languages. A major problem is that Standard Zhuang doesn't reflect reality of usage, unlike Standard Mandarin. I don't think it would necessarily go against how other IPA modules work. {{bo-IPA}} also allows multiple dialects. We could basically use the Chinese model as a prototype: separate modules for different varieties, but one template ({{zh-pron}}). — justin(r)leung _{{ (t...) | c=› }} 01:45, 13 January 2020 (UTC)[reply]

@Justinrleung: Thanks for pointing out to the Tibetan one. It seems dialectal differences in Tibetan are minor, at least judging my the module and affect what is displayed by the template, not so much the pronunciation. I don't disagree with you. za-IPA can be made to manage different Zhuang varieties, just like Chinese zh-IPA handles different Chinese varieties, including Mandarin cmn-IPA but it makes sense to make separate modules for dialects and let za-IPA handle the default/standard accent. However, za-IPA is already working for a specific default and most known accent/variety. A macro module to handle this one and others can be added. In any case, it's better to choose what should be considered the "standard" or default one. --Anatoli T. ^{(обсудить}/^вклад) 02:18, 13 January 2020 (UTC)[reply]

FWIW, the Tibetan template directly accepts IPA for non-Lhasa dialects, and there are major differences, such as presence/lack of complex consonant clusters. This approach skips the need for making dialect-specific modules/templates.

As for the "standard or default" variety of Zhuang, that would precisely be "Standard Zhuang"... This is what Module:za-pron currently handles (note the {{a|Standard Zhuang}} in its output). —Suzukaze-c ◇◇ 02:24, 13 January 2020 (UTC)[reply]

@Suzukaze-c: Thanks, just displaying the IPA as a parameter would make a number of languages dumped into one template easier. --Anatoli T. ^{(обсудить}/^вклад) 02:40, 13 January 2020 (UTC)[reply]

@Suzukaze-c, Atitarev: The problem with Zhuang is that there are separate orthographies for other varieties, meaning that we can't just put IPA in. What we have at raemx is not ideal. — justin(r)leung _{{ (t...) | c=› }} 03:04, 13 January 2020 (UTC)[reply]

It's definitely a case for separate dialect modules but good most dialects won't have sufficient resources. raemx and others can kept as is if nothing is done to automate the dialect modules, IMO. --Anatoli T. ^{(обсудить}/^вклад) 03:25, 13 January 2020 (UTC)[reply]

@Atitarev: What I mean is that the word should be written as naemx for Nong Zhuang and not raemx, unlike for Chinese, where the dialectal variants would all be written as 水. — justin(r)leung _{{ (t...) | c=› }} 03:29, 13 January 2020 (UTC)[reply]

Yes, I understand what you meant by "separate orthographies". I can also imagine that a (standard or common) spelling can be shared even if the pronunciations may differ. Some orthographies my be non-existent for verification purposes. --Anatoli T. ^{(обсудить}/^вклад) 03:34, 13 January 2020 (UTC)[reply]

If I understand WT:CFI correctly, since Zhuang is not a well-documented language, the verification for it is slightly different. Zhuang orthography is pretty much phonemic, so I think we can perhaps adapt the orthography to non-standard lects (based on sources like 壮语方言土语音系). — justin(r)leung _{{ (t...) | c=› }} 03:59, 13 January 2020 (UTC)[reply]

Yes, the alternative is to use "spelled as" but "pronounced as" approach (with phonetic respellings), in case of raemx, a naemx would only be an input - (just a possible example for words with no verified spelling, not necessarily this particular word or variety). An additional benefit is, the phonetic spelling can use symbols/tricks, which are not part of the spelling to force a different IPA. --Anatoli T. ^{(обсудить}/^вклад) 04:11, 13 January 2020 (UTC)[reply]

That would work if the spelling of a cognate exists, but what if there isn't a cognate in Standard Zhuang (or a variety that has its own verifiable orthography)? — justin(r)leung _{{ (t...) | c=› }} 04:15, 13 January 2020 (UTC)[reply]

That's a CFI question then. Specific Zhuang dialects may have their own, more relaxed CFI. I've seen IPA and words with numbers as entries for some rare languages/dialects but they may not have gone through an RFV process. --Anatoli T. ^{(обсудить}/^вклад) 04:23, 13 January 2020 (UTC)[reply]

Why don't we list the definition instead of linking to the singular masculine[edit]

I encountered this while searching for "pirralha", a word that you might be seeing in the news today.

It strikes me as bad UX to make users click through from a definition of a definition, instead of just listing the definition in the feminine entry. Also, the fact that we have an entire separate entry for the plural finds the same problem. Is this more of a technical wishlist item, where we should be asking for all versions of a word should redirect to one page, rather than separate entries?

I'm sure there are practical or technical considerations for why it's the way it is, but it's a change that I'd like to see. This is without getting into the sexist implications of such a structure, which I believe are valid criticisms but not necessary to invoke. — This unsigned comment was added by Intrepidus (talk • contribs) at 14:48, 11 December 2019 (UTC).[reply]

The problem is that this is a wiki, where people are making changes to entries all the time without coordination. If you include the definitions on the pages for all the other forms, than a correction to one page makes it disagree with the others. Any kind of automatic transclusion runs into the problem of some senses not being used in some grammatical forms. Chuck Entz (talk) 14:59, 11 December 2019 (UTC)[reply]

Anyway, we can add a short gloss using the parameter |t=, as I have now done at pirralha. —Mahāgaja · talk 16:35, 11 December 2019 (UTC)[reply]

We've had a similar issue for years in the Japanese entries. A Japanese term often has three distinct renderings: 1) kanji, often with trailing hiragana to specify inflection; 2) kana, most often hiragana, sometimes katakana, sometimes both; 3) romaji, a.k.a. the Latin alphabet, provided as a soft-redirect aid for site visitors who cannot read or input Japanese characters. The convention has been to keep the core entry at the lemma spelling.

In some cases, the lemma spelling itself isn't clear -- maybe it's the kanji spelling, maybe it's the kana only.

Plus, there's the problem of homophones (which, in Japanese, are also often homographs), of which there are very many. Even if a user figures out that they need to look up the word shōko (shōko) with one long and one short vowel, they are redirected to the hiragana entry しょうこ, from which they have to pick from 25 different possible terms. Even if they know from context that it must be a noun, they still have 17 to choose from. Without at least a gloss, the user must click through 17 different entries to be able to find the appropriate term.]

This is patently awful usability. As a help, the JA editors have added glosses to many of the hiragana soft-redirect pages, to give readers a better chance of picking the correct lemma entry. See this specific version of that しょうこ page for an example of the legacy state.

However, as noted above by Chuck, and noted elsewhere in various threads, this means duplication of content, and a need to ensure that multiple pages are in sync. As a manual process, in a volunteer project, this presents various difficulties.

Wyang (talk • contribs) and others engaged in a lot of wiki wizardry to pull together the unified approach to Chinese entries. Dine2016 (talk • contribs) took a somewhat similar approach to create the {{ja-see}} and {{ja-see-kango}} templates. These allow us now to create the soft-redirect pages, and have the template automatically pull the basic term info (part of speech and definition) from the lemma entry. This means we can keep the data in one place, and we don't have to duplicate, and we don't have to worry about keeping multiple pages manually in sync. Compare, for example, this legacy version of the かんがん page with this one using the new template approach.

⇒ I suspect something similar would work for all our other "form of" entries. Providing a user with at least a gloss, often what they're seeking, is much better usability than forcing them to click through to other pages, in some cases several times. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:57, 11 December 2019 (UTC)[reply]

And why don't we gloss source terms (and cognates) in etymologies? DCDuring (talk) 02:23, 12 December 2019 (UTC)[reply]

At my home wikt, if source terms don't have so different meaning from the entry, no need to add glossary on them. I don't know here can accept this concept. --Octahedron80 (talk) 02:30, 12 December 2019 (UTC)[reply]

Proposal to hyperlink the names of the descendent languages[edit]

Can we have the language names in our Descendants section hyperlinked just as how the language names in the Etymology section are? This, I believe, is going to be helpful for the readers. I myself sometimes have to make it to Wikipedia when a language name is unknown to me. Thoughts? —Lbdñk (talk) 18:42, 13 December 2019 (UTC)[reply]

They are purposefully not linked. I would be strongly opposed to this change. --{{victar|talk}} 19:40, 13 December 2019 (UTC)[reply]

@Victar: Just consider this: in case of inherited terms, one can get a rough understanding of the language from the language family tree itself, but think about languages with borrowed terms. There are actually so many languages that are largely obscure to the readers, e.g., when looking at the descendants of Portuguese saia, hardly any one will know about the Kadiwéu language. —Lbdñk (talk) 18:40, 14 December 2019 (UTC)[reply]

I'm well aware of the obscurity of some lects, and am still opposed. --{{victar|talk}} 18:42, 14 December 2019 (UTC)[reply]

@Victar: If you are opposed, then why do you find no problem with hyperlinked language names in the Etymology section? And what is the harm in my proposal? —Lbdñk (talk) 19:52, 14 December 2019 (UTC)[reply]

It's a total different format. Etymologies are written in full sentences, descendants sections are lists. Apples to oranges. --{{victar|talk}} 19:55, 14 December 2019 (UTC)[reply]

If you don’t know the language already you perhaps don’t need to either. Fay Freak (talk) 19:45, 14 December 2019 (UTC)[reply]

This could be done by a gadget if most people are dead-set against making it the default. — Eru·tuon 23:10, 14 December 2019 (UTC)[reply]

I don't see what's wrong with it. —Suzukaze-c ◇◇ 02:25, 15 December 2019 (UTC)[reply]

See this French snowclone: trop de X tue le X ("too much X kills the X"). Too much linking kills the linking. Canonicalization (talk) 22:24, 18 December 2019 (UTC)[reply]

Aramaic subfamilies[edit]

@Mahagaja: So the Aramaic language needs to be divided into three subfamilies, but setting these subfamilies as parents of Aramaic varieties in Module:etymology languages/data causes {{desc}} to consider them families and throw a module error (as noted in the edit summary of this reversion), because an "etymology language" is actually a language family when its parent value is a language family code.

Maybe this could be fixed by making the Aramaic language into a family. This would practically mean that arc would cease being a language, and its etymology-only subvarieties would become languages of their own, or subvarieties of a new full language, and arc would become the code of Aramaic languages, to replace its current code sem-ara. This doesn't seem entirely consistent with what User:Victar proposed here, since his proposal involves chronological distinctions rather than regional Aramaic subfamilies. But I don't understand the Aramaic languages well enough to sort all this out.

For now I've moved the etymology languages in the tree in Category:Aramaic language by editing Module:family tree/data. — Eru·tuon 20:00, 13 December 2019 (UTC)[reply]

@Erutuon, Victar, Fay Freak: (Notifying 334a, Rhemmiel): To be honest I don't know much about the Aramaic languages either, though it has always bothered me that we treat arc as one monolithic Aramaic language, even though the SIL considers it the code only of Official Aramaic (alias: Imperial Aramaic) of 700–300 B.C. If Official Aramaic is the ancestor of all other Aramaic languages, I have no problem keeping arc a language that is the common ancestor of the sem-ara family, just as la is the common ancestor of the roa family. As for the etymology-only languages, probably the easiest solution would be to promote arc-bib, arc-hat, tmr, jpa, and arc-pal to full-fledged languages, but I'll leave that up to people who know more about those languages than I do. I also think we should give arc a more specific name than just "Aramaic", which is liable to be confusing. We could follow SIL and call it Official Aramaic, or call it Imperial Aramaic, or even Old Aramaic since we have dispensed with oar and use arc to cover the period before 700 B.C. —Mahāgaja · talk 09:25, 14 December 2019 (UTC)[reply]

I see no problem in “treating arc as one monolithic Aramaic language”. I don’t think we should follow SIL or similar databases in anything diachronic. Making arc “Imperial Aramaic”, “Jewish Babylonian Aramaic” or any other lect does not work since it currently contains various lects and nobody is here able to sort them out. And many entries, especially for common words, are valid for all of Jewish Babylonian Aramaic, Jewish Palestinian Aramaic, Biblical Aramaic, and other lects one frequently hasn’t got a name for, and I have often run into the problem that a word from a certain environment – a certain text, a certain place and time, or perhaps place and time are ambiguous – I cannot assign any more specifically to any of the language codes present for Wiktionary, would I have to, other than arc, that is even for the modern languages. You people would only split into alleged “languages” you find in some lists but these lists of languages are neither conclusive nor comprehensive here, nor designed for lexicography. From the obvious conclusion that everything from 700 BCE to today cannot be one language one does not succeed at the same time to dissect it with any precision. And yet you already fell victim to the baseless illusion that everything under four equality signs is a language, pulling also a Geographyinitiative. Fay Freak (talk) 13:22, 14 December 2019 (UTC)[reply]

Pinging those involved in this discussion that I have created a vote Wiktionary:Votes/2019-12/Splitting_Aramaic. @Erutuon, Mahagaja, Fay Freak --{{victar|talk}} 20:51, 14 December 2019 (UTC)[reply]
Would someone better informed mind drawing up a list of Wiktionary's divisions of Aramaic, their time periods, the scripts they use, the corpora they appear in, differences in phonology or transliteration or lemmatization, and any other general characteristics that have relevance to dictionary entries? That would help people get a picture of what the current state of Aramaic is like and what the proposed new state would be. — Eru·tuon 21:15, 16 December 2019 (UTC)[reply]

I can do this in the next few days. Rhemmiel (talk) 12:19, 17 December 2019 (UTC)[reply]

For a starting point, here's the list of Aramaic varieties/corpora that you can filter for in a KWIC (Key Word in Context) query for "GBR" at the The Comprehensive Aramaic Lexicon (http://cal.huc.edu/dKWIC.php?lemma=gbr+N):

Old Aramaic
Imperial/Official Aramaic
Biblical Aramaic
Palmyrene
Nabataean
Hatran
Assur and vicin.
Qumran
Jewish Literary Targumic
Jewish Palestinian (Galilean)
Palestinian Targumic
CPA
Samaritan
Syriac
Magic Bowl koine
Babylonian Talmudic
Babylonian Gaonic
Mandaic
Late Jewish Literary

Chuck Entz (talk) 01:15, 20 December 2019 (UTC)[reply]

Planning to oppose[edit]

Those lects are not languages, but corpora. Jewish Babylonian Aramaic stands more for a corpus than a language; or more, I claim it’s not a language in the sense needed here, like one does not add a language Law French either. Of course it is an important corpus, but there are many crazy ones, for one example CAL has on some places “Babylonian Magic Bowl Koine Aramaic”. “Jewish Literary Aramaic” is also as frequent as vague category, denoting a “Standard Literary Aramaic” mixed with elements from Eastern and Western Aramaic. For important standards already one does not know where to put them if the vote “splitting the language code for Aramaic [arc] into Old Aramaic [oar] and Jewish Babylonian Aramaic [tmr], and relegating [arc] to a family code” is enacted, so Biblical Aramaic could be Old Aramaic and could not be; but nobody would seek it under “Old Aramaic” or “Imperial Aramaic” or whatever. Jewish Palestinian Aramaic is neither and there are already entries with it. One can also just refrain from making the distinction already in the L2 header, I and most people are content with labels ({{tlb}}); I also point out that it makes the category system more useful. A category like Category:arc:Business comprising all the Aramaic lexicon has on business would be more sought than the same thing scattered amongst categories for multiple unpredictably split language names. It’s all favourable to have it written on the page what lect this is, for a desired minimum of linguistic precision as the category of Aramaic is wide, there we agree, but the bruteforce, random cracking of languoids of which we have only an amorphous mass of historical data goes too far for me; labelling is a milder, safer, and more effective means. Fay Freak (talk) 21:33, 14 December 2019 (UTC)[reply]

@Fay Freak, if your intention is to otherwise subvert the result of the vote, I'd be happy to be more explicit in its language. This vote would be to enact proposal #2 "Split Old Aramaic and JBA apart", under Wiktionary:Beer_parlour/2019/April#Splitting_Aramaic. --{{victar|talk}} 22:13, 14 December 2019 (UTC)[reply]

I agree @Fay Freak. Nearly every time I open a translation table and look at the Aramaic entry I’m confronted by a “Classical Syriac” translation and a “Jewish Babylonian Aramaic” (or sometimes just ““Hebrew””??) translation. These turn out to be, without fail, the exact same word down to the diacritical marks and transliteration. The one thing that prevents them from being identical is that they’re written in different versions of the Aramaic script. Can we please stop this nonsense. I’ve thought a decent amount about how Wiktionary’s handling of Aramaic could be improved, so I’ll try to gather my thoughts and weigh in with something concrete soon. Rhemmiel (talk) 08:21, 15 December 2019 (UTC)[reply]

User:Victar asked me to comment here. I tend to agree with what User:Fay Freak is saying. At the moment, I happen to be working a lot with Jewish Babylonian Aramaic in my personal life (feel free to guess why), so if you guys have any questions about the different dialects (or perhaps, as Fay Freak said, they are better called corpi), feel free to ask. --Wiki Tiki 89 20:31, 16 December 2019 (UTC)[reply]

@Wikitiki89, Fay Freak: I don't have any problem creating a generic code for Jewish Middle Aramaic lects (ex. [tmr] or [arc-jma]), if that's what's best. My main grievance is combining Middle Aramaic with Old Aramaic, for which there are many differences in script, meaning, syntax, especially when you take Imperial and Official Aramaic into account. --{{victar|talk}} 20:47, 16 December 2019 (UTC)[reply]

@Victar I don't think differences in script or meaning are really big issues, that's what context tags are for. Really I'd say the trickiest issue is lemma forms, since for example in some forms of Aramaic, the absolute state is the basic form of the noun and the emphatic state (a.k.a. definite state) is often not attested for particular words (this applies primarily to older forms of Aramaic such as OA and IA), while in other forms of Aramaic (most later forms, especially those in the Eastern Aramaic group), the emphatic state is the normal form of the noun, and the absolute state is rarely if ever attested for most nouns. Additionally, for most macrolanguages that we have unified, it's useful to have a standardized spelling system with all dialectal spellings redirecting to the standard one, but I don't see that as practical for Aramaic, so we might have a lot of duplication of content. --Wiki Tiki 89 21:05, 16 December 2019 (UTC)[reply]

@Wikitiki89: I do think script does merit some consideration. With most Old Aramaic entries potentially in Imperial Aramaic script, an overlap with Jewish Middle Aramaic entries seems a moot point. --{{victar|talk}} 21:54, 16 December 2019 (UTC)[reply]

I don’t understand your fixation on Jewish Middle Aramaic. Perhaps you assumed that everything Christian is covered as Classical Syriac, which is not the case. Christian Palestinian Aramaic is mostly like Jewish Palestinian Aramaic, except it’s written in the (so-called) Syriac script, and it is not counted to the Classical Syriac corpus, whatever that consists of. And there were pagans too. Nabataean Aramaic. Just some keywords. Fay Freak (talk) 20:55, 19 December 2019 (UTC)[reply]

@Fay Freak: You make a fair point, and maybe something along the lines of Literary Middle Aramaic may be a better designation. --{{victar|talk}} 21:29, 19 December 2019 (UTC)[reply]

I think 𐡂𐡁𐡓𐡀 (gbrʾ, “man”) is a fairly good an example of the arguement of duplication being rather a moot point, with Aramaic rendered in several different scripts. --{{victar|talk}} 03:48, 17 December 2019 (UTC)[reply]

I updated the language of the vote proposal. I know it still doesn't agree with you, but maybe it's a little more comprehensive. --{{victar|talk}} 20:40, 19 December 2019 (UTC)[reply]

Alternative proposal[edit]

What do people think about renaming [arc] back to Standard Literary Aramaic or Literary Middle Aramaic, and instating Old Aramaic [oar] as the language code for the Imperial, Official, and Biblical Aramaic? @Fay Freak, Wikitiki89, Mahagaja, Rhemmiel, Erutuon, JohnC5. --{{victar|talk}} 22:56, 19 December 2019 (UTC)[reply]

Obscure Chinese characters[edit]

Is there a conventional way on Wiktionary to represent Chinese characters that are not yet in unicode? --Lvovmauro (talk) 08:30, 14 December 2019 (UTC)[reply]

I don’t think so. For entry names this is hopeless; there would be no reasonable way to search for the entry. For other uses, you can upload the image to Commons and then use the image, like for instance

([[File:182827異體字楫.gif|18px]]). --Lambiam 11:40, 14 December 2019 (UTC)[reply]

Category:Terms containing unencoded characters —Suzukaze-c ◇◇ 02:24, 15 December 2019 (UTC)[reply]

Please follow how to write in IDS: for above example ⿰木𦙃. Anyway, you can also wait until CJK Extension G coming; the unencoded characters will be converted into proper ones (if available). If you have trouble to find equivalent components, ask us. --Octahedron80 (talk) 02:32, 15 December 2019 (UTC)[reply]

Policy on usage examples[edit]

WT:USEX says, under the heading "Unofficial Policy":

"Example sentences may be replaced by citations and can be removed when citations for the given definition are already present. This is in part because of redundancy, in part because the server automatically hides citations starting with the wikisyntax #*. Example sentences cause these to appear a further line below the definition for every example given.

"However, where an example sentence provides a usage example not covered by the quotations present and useful for readers, it should not be removed."

I propose that this whole section be deleted. I do not agree with the idea of deleting example sentences just because citations are present. For people who just want a quick understanding of the definition, example sentences are more user-friendly and user-accessible than citations, and can be tailored more exactly so as to clearly differentiate between senses.

In parallel with this, I propose making the following change to WT:ELE:

Existing text:

"Generally, every definition should be accompanied by a quotation illustrating the definition. If no quotation can be found, it is strongly encouraged to create an example sentence."

New text:

"Generally, every definition should be accompanied by one or more quotations illustrating that definition. Quotations are supplemented by example sentences, which are devised by Wiktionary editors in order to illustrate definitions."

Do these changes have to go to an official vote, or can the issue be decided here? Mihia (talk) 20:29, 17 December 2019 (UTC)[reply]

I agree with the deleting, especially for the clearly differentiating between senses. For common words (e.g. German stellen) I have to specifically make up examples or idealize because in an infinite number of quotes one does not get the quotes together as good as one wants. The part “in part because the server automatically hides citations starting with the wikisyntax” in the old text seems like a non sequitur.

I don’t see why they “encourage” in the WT:ELE. I like the text though “If no quotation can be found” because it provides another argument for my view that inclusion not necessarily requires three durable quotes, but I rather forgo it with a nonsensical apodosis; it confuses people only more confused than they already are. The new text seems better but you know that it does not work if it can “generally should”. The normative/imperative quality of the sentence “Quotations are supplemented by example sentences” is unclear (“are”??); its relative sentence seems superfluous because if example sentences are something other than quotations then they are devised by Wiktionary editors in order to illustrate definitions; if it is there for clarification, then stylistically the mention of quotations should also get a relative sentence for the parallelism. Fay Freak (talk) 21:15, 17 December 2019 (UTC)[reply]

I support the changes. Quotations are fine for attestation purposes but often otherwise unhelpful because they do not illustrate the sense very well. I’d be happy with a general rule that if no serious objections are raised against policy changes proposed here – provided that it is clear what is being proposed – they don’t have to go through a formal voting process. --Lambiam 23:18, 17 December 2019 (UTC)[reply]

Support. Usexes and quotations have different reasons of being. Jberkel 23:57, 17 December 2019 (UTC)[reply]

I'm in favor of some change, but I think of WT:ELE as normative/obligatory, not precatory ("should") or descriptive ("Quotations are supplemented"; "sentences, which are devised by Wiktionary editors").

There is room for a lot of judgment in the selection of both citations and usage examples. Citations often only address specific points required to respond to RfV or RfD. Perhaps such citations belong only in citation space. Usage examples may illustrate the range of grammatical contexts in which a word is used and often don't illustrate well the scope of meanings that fall within a definition. Perhaps we need some kind of style guide specifically on the selection and placement/display of both citations and usage examples. DCDuring (talk) 02:07, 18 December 2019 (UTC)[reply]

I am taking this as Support, and I will implement the change to WT:USEX. I do not have permission to edit WT:ELE. If anyone objects, please undo the change and we can have a formal vote. The wider-scale suggestions for change are beyond the scope of my proposal at the present time. Mihia (talk) 18:30, 2 January 2020 (UTC)[reply]

Note: change to WT:ELE has been made per Wiktionary:Beer_parlour/2020/January#Request_for_changes_to_WT:ELE. Mihia (talk) 11:31, 4 January 2020 (UTC)[reply]

Format of usage examples[edit]

WT:ELE says:

"Example sentences should: [...] be grammatically complete sentences, beginning with a capital letter and ending with a period, question mark, or exclamation point."

In fact, it is sometimes useful to allow example phrases, and there are plenty of instances of this throughout Wiktionary. Therefore I propose that this text is changed to:

"Example sentences should: [...] normally be grammatically complete sentences, but sometimes it may be desirable to give example phrases."

Does anyone object? Mihia (talk) 20:35, 17 December 2019 (UTC)[reply]

That seems totally reasonable--there could be some times when a shorter clause or phrase is more illustrative than an entire proper sentence. —Justin (koavf)❤T☮C☺M☯ 20:43, 17 December 2019 (UTC)[reply]

Why are we calling them “example sentences“ then. The whole section header in WT:ELE is wrong. The systematical position of the section is also dubious, it should probably be above or below “Quotations”. And why isn’t there a corresponding header in the list of headings, since quotations can have a header? It is strange that every subsection under “4 Contents” in WT:EL represents a headline or similar but not “Example sentences”. Fay Freak (talk) 21:15, 17 December 2019 (UTC)[reply]

Then perhaps we should make a global terminology change from "Example sentences" to "Usage examples". I don't know how many pages this would affect. It would also require a page name change at Wiktionary:Example_sentences. Mihia (talk) 21:42, 17 December 2019 (UTC)[reply]

Quotations are also usage examples. If we're changing terminology let's make it something that reflects "invented by a Wiktionary editor" vs found in the wild. DTLHS (talk) 23:21, 17 December 2019 (UTC)[reply]

Well, yeah, that's true. Any suggestions? Mihia (talk)

There are also many usage examples which are in fact quotes, presumably because it's easier to throw in a usex than to format the quote (the process needs improving, we could leverage some existing tools). – Jberkel 00:01, 18 December 2019 (UTC)[reply]

Presenting a found quotation as a usex – apart from being easier on the editor – has the advantage that you can tweak it a bit, such as by removing irrelevant clauses (irrelevant for the purpose of exposition). --Lambiam 00:51, 18 December 2019 (UTC)[reply]

I don’t recognize such a dichotomical problem. “Quotes presented as usage examples”, though formatting mistakes are possible, I have not seen, except in one such case that I remember from my side: Phrasings for many possible senses one has crammed together from the web anyone could have uttered and are written by anonymous, and one does not link perhaps because they are on pages that supposedly vanish from the internet and and are not linked because of this, because they are full of ads, and also because of the irrelevant clauses one leaves to the privacy of perishability instead of perpetuating them in this project (it goes without saying that it is taken into account that the words left are not enough for copyright). This, together with tweaking, has appeared to be a convenient method to create the vocabulary of the Russian impolite society with examples for use, which not only does not appear in Soviet printing conditions but neither in any published sources you would find except the dark corners of the net because “one should not say such things”, even less write, and like an honest person one tells the reader what one has come to know about the usage anyhow and laudably one is perhaps not sufficiently vulgar to make own examples for mat, then one uses idealized snippets. It’s of course different for English where people constantly fake unobserved phobias and genders, amongst other “nonsense one finds on Urban Dictionary”, and one can expect anything to appear in print because of the ultimate state of ~~degeneracy~~ progress the Anglosphere has reached. But Russian print – has anyone who talks about durability even tried to buy books from Russia? It fails because of banking restrictions, hence as the literature does not exist in the accessible part of the world one has become creative with displaying what one has. As at least Russia is not behind in dumping every whim on the internet, which becomes difficult for some geographical and topical areas of the so-called well-attested languages, say Chad and Mauritania as opposed to Egypt and Algeria for Standard Arabic, or Senegal or Louisiana for French (and Germany’s Russian, which never-ever appears in print). Tells you the guy creating Slavic and Arabic entries from Germany. Fay Freak (talk) 16:05, 18 December 2019 (UTC)[reply]

For English, matters are relatively simple and dichotomous. We distinguish between usage examples and citations because of the attestation value of the latter. I don't think a normal person need be troubled by the difference. In any event, we don't have a distinct term readily available to refer to usage examples that do not meet our attestation standards. A usage example that is actually a word-for-word copy of something that appears in print is not a citation because it lacks the additional information we require to verify its authenticity. The expression "example sentence" does not fit what we refer to, offer, format, and templatize as "usage example", which can include phrases or other collocations. I personally prefer that we have a single full sentencee for usage examples and for citations.

Incidentally, Many of our citations are quite long (multiple sentences, complex sentences), apparently because the person adding the citation found them pretty or in accord with their POV. I think those visiting an entry should feel free to reduce them to single sentences, unless the additional material is essential to grasp the meaning. Usually a page link is a much better means of showing context than a long quote. DCDuring (talk) 22:49, 18 December 2019 (UTC)[reply]

Many of our page links go to Google Books, and these links sometimes become invalid, or return "you have reached your viewing limit etc.", so I think it doesn't hurt to add a bit more context. A single sentence is often not enough. – Jberkel 00:06, 19 December 2019 (UTC)[reply]

Then we should use sources like WikiSource, Gutenberg, UseNet, etc more. We are sacrificing usability. The vast bulk of the extra "context" is PoV or other chrome. DCDuring (talk) 00:48, 19 December 2019 (UTC)[reply]

Funny, Gutenberg are the ones who block access to German users. Fay Freak (talk) 21:00, 19 December 2019 (UTC)[reply]

+1 on prioritizing open, non-commercial sources. WikiSource is the obvious choice, or archive.org. Unfortunately less useful for citing the latest lexical fads. WT:QUOTE only mentions WikiSource and Google Books. It could be expanded a bit with a list of recommended sources. And the QuietQuentin gadget currently only supports Google Books. – Jberkel 22:27, 19 December 2019 (UTC)[reply]

As a case in point (taken from the next citation on a list of entries that have {{rfdate}}, not the result of a specific search for such citation) here is a citation for the following definition of live: "To spend, as one's life; to pass; to maintain; to continue in, constantly or habitually."

By 1980, South Korea had overtaken its northern neighbour, and was well on its way to being one of the Asian tigers – high-performing economies, with democratic movements ultimately winning power in the 1990s. The withdrawal of most Soviet aid in 1991, with the fall of the Soviet empire, pushed North Korea further down. Kim Il-sung had held a genuine place on North Korean people's affections. His son was regarded as a shadowy playboy, with rumours circulating over the years that he imported Russian and Chinese prostitutes, and lived a life of profligacy and excess.

How is anything more than the last sentence in any way useful for understanding the definition of live? DCDuring (talk) 00:59, 19 December 2019 (UTC)[reply]

Because the extent of one’s life is that whereby we measure everything else, and “living” a life of profligacy and excess is to be understood differently according to region and period, hence the special attention. Fay Freak (talk) 21:00, 19 December 2019 (UTC)[reply]

I can't tell whether you are joking:

If you are, ha ha.

If not, how does all of the cruft about South Korea, Soviet aid, and Kim Il-sung relate to that? If that's the best you can do, I'll be deleting the three excess sentences. DCDuring (talk) 21:15, 19 December 2019 (UTC)[reply]

@DCDuring: You say above that "I personally prefer that we have a single full sentence for usage examples and for citations". Ignoring for the moment the terminology issue that example phrases are not sentences, does this mean that you oppose the essential point of my original proposal, which is to acknowledge at WT:ELE that example phrases may sometimes be appropriate? Mihia (talk) 11:42, 4 January 2020 (UTC)[reply]

Yes, as an isolated proposal. I'm not sure whether I would oppose it or just fail to support it. Oftentimes the phrasal usage examples are themselves idioms, presumably already appearing on the page as derived terms. Maybe the proposal is acceptable for adjectives, which could have noun phrases as usage examples. I am more skeptical about whether other word classes are adequately exemplified with phrasal usage examples, though I could be convinced. This kind of thing would be good in a non-mandatory style guide for (English) definitions. DCDuring (talk) 17:44, 4 January 2020 (UTC)[reply]

Noun phrases that are typical collocations of adjectives but fall short of idioms are good examples of where example phrases may be appropriate. For example, the definition "(of physical features) Plump, round" at full has examples "full lips; a full face; a full figure". Perhaps we could mention this in the wording. For example:

"Example sentences should: [...] normally be grammatically complete sentences, though sometimes it may be desirable to give example phrases, for example when listing typical collocations.

Would you be happier with something like this? We could even give specific examples such as the ones at "full", though I am wary of giving this point more length and prominence in the list than it deserves. Mihia (talk) 21:08, 4 January 2020 (UTC)[reply]

The right place would be something like a style guide, which would explicitly not be mandatory (or forbidden). Perhaps we would discover specific things that could and should become mandatory. It is hard to come up with terse, understandable rules for many things. A style guide can be be quite expansive and would be a good forum for discussing approaches to improving definitions etc. DCDuring (talk) 22:52, 5 January 2020 (UTC)[reply]

Wording of two Citations-related templates[edit]

Interested parties, please see Template talk:seeCites#Reword and comment there about proposed changes to the wording of two relatively high-use templates, Template:seeCites and Template:seemoreCites (two possible phrasings are being discussed for each template). Thanks. - dcljr (talk) 03:01, 18 December 2019 (UTC)[reply]

Please answer my question. What is the reason for folloiwng blocks in wikipedia ? The same rule appllicable to user name of wiktionary?[edit]

https://en.wikipedia.org/w/index.php?title=Special:Log/block&page=User:MyBuddha

https://en.m.wikipedia.org/wiki/Wikipedia:RfC/User_names/Institutional_memory#Names_of_religious_figures

(Dentnoyes (talk) 20:08, 18 December 2019 (UTC))[reply]

No, there is no such rule here. DTLHS (talk) 20:13, 18 December 2019 (UTC)[reply]

Trolling. See w:en:WP:Long-term abuse/Nsmutte. —Justin (koavf)❤T☮C☺M☯ 22:49, 18 December 2019 (UTC)[reply]

Layout of definitions[edit]

I am initially listing this proposal here for comments. If necessary, it can go to a formal vote.

I propose that we insert the following text into the "Definitions" section at WT:EL:

"Definitions, including definitions that are single words or lists of single words, should begin with a capital letter and end with a full stop (or, in special cases, exclamation mark or question mark)."

As far as I can tell, there is presently no written policy about this. In my experience, the majority of English definitions are already formatted like this, but some are not, and in those cases the mixture of different styles looks messy or unprofessional.

Definitions of terms in other languages seem to more commonly use a no-capital-no-full-stop style. If there is some reason why definitions of non-English terms should be formatted differently from definitions of English terms, then the above text can be adjusted to apply to English only.

Capitalisation in templates used in definition lines is also inconsistent. Some, such as "alternative form of", output text with an initial capital, while others, such as "past participle of", do not. Ideally this presentation needs to be made consistent across all relevant templates. Mihia (talk) 21:52, 19 December 2019 (UTC)[reply]

I agree with the change (for English entries). The definition section of WT:EL is very terse (for such a core part of the dictionary). – Jberkel 22:43, 19 December 2019 (UTC)[reply]

My understanding is that the terms in other languages properly have translations, not definitions. As such they wouldn't follow this rule. -Mike (talk) 22:51, 19 December 2019 (UTC)[reply]

Quoting from our Wiktionary:Style guide, section Definitions:

Most definitions on Wiktionary are either full definitions or glosses. Full definitions, which are preferred for English terms, explain the meaning a particular sense in detail. Glosses, which are preferred for non-English terms, simply point the user to one or more English translations of the term.

A full definition should start with a capital letter. Because a definition is not normally a complete sentence, opinions vary on whether it is necessary to end a full definition with a period. However, in the current editing practice most definitions end with a period.

A simple gloss should not be capitalized and should not end with a period.

Whatever we decide, we should make sure the rules are kept consistent across various pages. --Lambiam 23:24, 19 December 2019 (UTC)[reply]

@Lambiam: Thanks, I wasn't aware of this Style Guide section. If my proposal is adopted I will also change this text in accordance, or perhaps it would be better to just document it at the style guide and add a link to the entry layout page. Mihia (talk) 21:02, 20 December 2019 (UTC)[reply]

I would also point out that while the Style Guide presently says "A simple gloss should not be capitalized and should not end with a period", it later says "a gloss for an English term should be formatted as a definition", giving the example [[cat|Cat]], i.e. capitalised. I suppose the first statement could be intended to apply to non-English only, and a non-English example is indeed given, but the wording is unclear in this respect. Mihia (talk) 12:19, 22 December 2019 (UTC)[reply]

I'm reasonably certain that the majority of the JA sense lines are formatted in non-sentence format -- no initial capitalization, no final punctuation. All of the JA entries I've worked on are formatted this way. Likewise for Ainu, Maori, Hawaiian.

Note too that senses of non-EN terms might require fuller definitions, depending on how well the term's meaning aligns with the English vocabulary. These senses I have similarly formatted without initial capitalization, and without final punctuation -- in order to align with WT:ELE.

If WT:ELE is amended to require sentence-style formatting for non-EN term sense lines, we've got a lot of work ahead of us. ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:57, 20 December 2019 (UTC)[reply]

I would like this to be policy (at least for English words defined in English). But, side note: from a "DRY" perspective (and the associated double risk of typos) I don't much like the inherent repetition in something like [[coldness|Coldness]], and have sometimes wished for a simple "capitalise first letter" notation, like [[^coldness]] or something. Equinox ◑ 07:14, 20 December 2019 (UTC)[reply]

We have template {{1}}, which however always must be subst’ed (why?). Using copy–paste, it is hardly more convenient to enter {{subst:1|a woman without a man is like a fish without a bicycle}} than [[a woman without a man is like a fish without a bicycle|A woman without a man is like a fish without a bicycle]]. --Lambiam 16:36, 21 December 2019 (UTC)[reply]

Oppose See Wiktionary:Grease pit/2019/March § Cleanups of form-of templates. Gloss lines of which ever content should never be capitalized and never end with a dot in any language. The section in Wiktionary:Style guide should be deleted, nothing is preferred, one just writes to get across what one knows about a sense. Capitalizing definitions is a reactionary practice conceived from when Wiktionary depended in its habits on Wikipedia and even wasn’t case-sensitive in page-titles.

In any case the templates mentioned in the other discussion need to be fixed; according to the current practice behaving differently depending on whether the language code is English or foreign language. Extending the capitalization craze to foreign languages shan’t occur, for it would mean all foreign language glosses should be revisited, in addition to many other code cleanups that are taking years, and you are threatened with abandonment of this project by the often alone foreign language editors if this effete thing is added onto their task list. Instead, English language editors should repent and admit not capitalizing and not stopping English glosses, English sections making up only a tenth of this dictionary. New users are confused by the English practice, like OP who deduces something from “the majority of English definitions”, but the English part of this dictionary is not the bellwether of formatting, rather it is the most errant part of this dictionary for being begun first. Best formatting practices only have developed in the half of a decade that now runs out, with templates and observations that were not there before. Fay Freak (talk) 16:57, 21 December 2019 (UTC)[reply]

In view of the opposing voice, I have created a vote on this at Wiktionary:Votes/2020-01/Definitions and glosses of English terms should start with a capital and end with a full stop. This proposes a change to WT:STYLE, not WT:EL as I originally envisaged. The relevant section of WT:EL now has a "see also" link to the relevant section of WT:STYLE. Mihia (talk) 21:16, 5 January 2020 (UTC)[reply]

"gloss" terminology[edit]

I intend to propose a wording change to WT:STYLE in line with the #Layout of definitions proposal above, but I need to first ensure that I understand the terminology used there. I gather that an English translation of a non-English term is termed a "gloss", and that a single-word "definition" of an English word also a "gloss"? Is that right? In the latter case, does it literally have to be just a single word? What about a two-word noun phrase? What about if there is an article? At what point does it stop being a "gloss"? Mihia (talk) 23:38, 21 December 2019 (UTC)[reply]

Did you try looking at gloss? Sometimes it is easy to forget that this is a dictionary and not just a forum. --ReloadtheMatrix (talk) 08:32, 22 December 2019 (UTC)[reply]

Please tell me which part of that entry answers my question. Mihia (talk) 11:50, 22 December 2019 (UTC)[reply]

I can't do that. I was mildly trolling again. Sorry about that. --ReloadtheMatrix (talk) 11:06, 29 December 2019 (UTC)[reply]

We use the term gloss in two different ways, which may be somewhat confusing. We have {{non-gloss definition}}s; the other definitions are then gloss definitions, which WT:STYLE refers to as just “glosses”. Then we have {{gloss}}es, which are not definitions but serve as an aid to understanding a definition by rephrasing it in different words, which is especially useful for disambiguating gloss definitions that use ambiguous terms. The latter use corresponds most closely to how scholars use the term. --Lambiam 12:40, 22 December 2019 (UTC)[reply]

To avoid ambiguity here, I use the term “straightforward definition” (SD) for a definition that is not a “non-gloss definition” (NGD). Focussing on the definitions of English lemmas, a good SD (next to properly defining the meaning of the definiendum) is that its definiens is substitutable for the definiendum. For example, take this sentence: “They found a cottage in the village and settled down.” A reader who does not know the word cottage may consult Wiktionary and find the SD “a small house”. “Aha,” said reader will then say, ”so they found a small house in the village and settled down.” You may need to snip off some wattles from the definition, additions that take forms like “, such as ...” or ”, particularly ...”. The definiens of an SD always assumes the same grammatical part of speech as the definiendum. For interjections it is rare to find a substitutable definiens; only synonyms will do the job. How then to define Whatever! ? This is where NGDs kick in, in the case of Whatever! in the form of “A holophrastic expression used discourteously to indicate that the speaker does not consider the matter worthy of further discussion.” (The formatting at our entry is not according to WT:STYLE.) This is obviously not substitutable, being a noun phrase rather than an interjection. The length of the definiens has little to do with it. NGDs tend to be lengthy, but SDs can be lengthy too. --Lambiam 13:22, 22 December 2019 (UTC)[reply]

NGDs are essential for many core grammatical function words and for expressions that have a social/discourse regulating role. They are never (well, hardly ever) substitutable, but are "real" definitions, though atypical.

"SDs" are of two principal kinds:

those that rely on synonyms, either a single one or a list of them, inherently substitutable when correct. Our FL entries rely heavily on single-word/-term definitions. I have often used gloss to refer to these and only these, which seemed to be general practice.
multi-word phrasal definitions that have hypernyms and differentia and should be substitutable. I view these as typical "real" definitions.

One big problem with the synonym-based definitions is that they often use polysemic terms. Sometimes this is OK, but often not. {{rfclarify}} is underused to seek resolution of this. I find the problem most serious in FL sections. It would be useful to know all the English definitions of the one-word gloss were possible meanings of the FL word being defined. Rarely does one find a helpful qualification such as a list of applicable senses. DCDuring (talk) 16:12, 22 December 2019 (UTC)[reply]

@Mihia: Any articles need to be stripped from the definition for substitution. I usually remove "the" from SDs, except for those of proper nouns. A is sometimes useful as a marker of the countability of a definition. DCDuring (talk) 16:23, 22 December 2019 (UTC)[reply]

I do not agree that articles should generally be removed. Often an article is required to make a definition read like normal English. For example, look at car, just as a random example. All the definitions of the noun begin with either a definite or indefinite article, and many would read oddly without these. However, the issue of the presence/absence of articles is not a specific part of my proposal, only background so that I can understand correctly how we use the term "gloss". Mihia (talk) 20:22, 22 December 2019 (UTC)[reply]

If articles are in the definitions, they have to be stripped for substitutability to work. Initial the can be misleading in definitions. A is commonly used by lexicographers, occasionally any. DCDuring (talk) 22:46, 22 December 2019 (UTC)[reply]

I understand from your reply that you believe that the entry for car would be improved if all leading articles were removed. I entirely disagree. Mihia (talk) 23:21, 22 December 2019 (UTC)[reply]

I never gave a hint that removing initial a was a good idea. OTOH, I was too strong in my hostility to initial the. I've often noticed it misapplied, but definitions that refer to something definite and unique in a multipart system or organism usually have to begin with a definite article. But nevertheless initial articles or other determiners almost always have to be stripped when the validity of the wording of the definition is to be tested using the substitution test. DCDuring (talk) 02:54, 23 December 2019 (UTC)[reply]

OK, when you said "If articles are in the definitions, they have to be stripped for substitutability to work", I thought you meant "... and therefore we should strip them where we find them". Mihia (talk) 11:47, 4 January 2020 (UTC)[reply]

inflection of: separate templates / headings when the pronunciation is different?[edit]

First, some background. Finnish verbs have many different forms, but I'll focus on three (sets) which are important to this particular tale:

third-person singular present (indicative)
third-person singular past (indicative)
present (active) indicative connegative, second-person singular imperative, second-person singular imperative connegative (these three are always identical regardless of the verb).

For most verbs, the first, second and third are not identical, but to some verbs they have identical spellings; such as for osioida > osioi. However, they don't have identical pronunciation; 1 and 2 have IPA^(key): /ˈosioi̯/, but 3 has IPA^(key): /ˈosioi̯(ʔ)/. Still, currently, all three are combined into a single {{inflection of}}.

The question is: should they be? Since they technically have different pronunciation, I feel 3 should be separate from 1/2, perhaps even under separate numbered etymologies to allow for different pronunciation. Or does the answer depend on whether the pronunciation is actually given on the entry (without the pronunciation(s), they should all be unified, and not if they are given and different)? — sur jec tion ⟨?⟩ 20:56, 23 December 2019 (UTC)[reply]

@Surjection Following what's done e.g. for Russian, I'd put them in the same etymology section and list two pronunciations, tagged appropriately with {{a}} or whatever to indicate which inflections go with which pronunciation. It would be a lot easier if there were diacritics in the Finnish text indicating the presence of the (optional) glottal stop; then you could do exactly what's done for Russian with stress variants that signal different inflectional categories, e.g. воды́ (vodý) (gen. sg.) vs. во́ды (vódy) (nom./acc. pl.), which is to list them under different headers and use annotations corresponding to the accented spelling to distinguish the pronunciations. Benwing2 (talk) 16:30, 26 December 2019 (UTC)[reply]

In the entry you linked, they're in the same etymology sections, but separate L3's; this might just be because of the different headword, but would you think this is also a good approach for Finnish (even though the spelling is completely identical)? — sur jec tion ⟨?⟩ 16:44, 26 December 2019 (UTC)[reply]

@Surjection Since I assume adding a diacritic indicating the glottal stop is out of the question, I would format it with a single L3 header, something like this:

==Finnish==

===Pronunciation===
* {{fi-IPA|...}} {{a|third singular present/past indicative}}
* {{fi-IPA|...}} {{a|present indicative connegative, second singular imperative (connegative)}}

===Verb===
{{head|fi|verb form}}

# {{inflection of|fi|osioida||pres|actv|indc|conn|;|3|s|pres//past|indc|;|2|s|impr|;|2|s|impr|conn}}

Benwing2 (talk) 02:05, 27 December 2019 (UTC)[reply]

I see, it's probably possible to implement something like that in {{fi-pronunciation}} too, and maybe one day ACCEL could add pronunciation info. — sur jec tion ⟨?⟩ 15:48, 28 December 2019 (UTC)[reply]

Proposal: make Frankish an etymology-only variant of Proto-West Germanic[edit]

In theory, at the time of the migrations, Frankish would have been a West Germanic dialect group, and not a distinct language. From a practical point of view, Frankish and Proto-West Germanic are almost indistinguishable, and most of our current Frankish lemmas could be Proto-West Germanic lemmas without any change in form. So I propose to merge them and remove Frankish as a distinct language, instead treating it as a Proto-West Germanic dialect, and making it an etymology-only language with Proto-West Germanic as its parent. —Rua (mew) 12:27, 24 December 2019 (UTC)[reply]

I support the suggestion. —Mahāgaja · talk 15:08, 24 December 2019 (UTC)[reply]
Makes sense to me. Leasnam (talk) 03:51, 25 December 2019 (UTC)[reply]
Support. It finally makes sense to me what “Frankish” is. Why etymology-only though instead of replacing all references to Frankish with “Proto-West-Germanic”? Fay Freak (talk) 03:58, 25 December 2019 (UTC)[reply]
- For the same reason we also have Late Latin, Proto-Finno-Ugric, Proto-Baltic etc. It allows us to be more specific, and also means we don't have to fix tons of etymologies. —Rua (mew) 11:22, 27 December 2019 (UTC)[reply]
Support. Julia ☺ ☆ 02:32, 26 December 2019 (UTC)[reply]
I'm not opposed to it, but it really should come to a vote. --{{victar|talk}} 03:06, 26 December 2019 (UTC)[reply]
Support. Benwing2 (talk) 02:12, 27 December 2019 (UTC)[reply]

@Rua, are you deleting Frankish entries, despite this never going to a vote? --{{victar|talk}} 22:17, 15 January 2020 (UTC)[reply]

Yep, because a vote isn't needed. Only you called for one. —Rua (mew) 13:51, 16 January 2020 (UTC)[reply]
@Rua: I find that wildly inappropriate and have a created a vote on the issue. I ask that you cease and desist until the vote is complete.

Wiktionary:Votes/2020-01/Make Frankish an etymology-only variant of Proto-West Germanic --{{victar|talk}} 22:04, 16 January 2020 (UTC)[reply]
What a bureaucratic waste of time when everyone already voted in favour of it. —Rua (mew) 22:05, 16 January 2020 (UTC)[reply]
This is an issue the affects the work of many users, several of whom have not "voted" here. --{{victar|talk}} 22:09, 16 January 2020 (UTC)[reply]

@Rua I wanted to get some clarification per our discussion here; what time period are we reconstructing PWG to? 1st century? 4th century? So if 5th century Merovingian Frankish is dialectal PWG, then what is 8th century Carolingian Frankish? Late Frankish? Old Dutch? Proto-Dutch? @Mahagaja, Leasnam --{{victar|talk}} 23:45, 2 April 2020 (UTC)[reply]

@Victar: I suspect that the way it is reconstructed makes “Frankish” Proto-West-Germanic. In so far as it is attested, “Frankish” will be recognized as Dutch or German already. So there is a confusion because many etymology writers use “Frankish” and by their idealization effectively denote a language that we call “Proto-West-Germanic”, a term used by much fewer than “Proto-Germanic”. Fay Freak (talk) 00:58, 3 April 2020 (UTC)[reply]

Forgive my confusion and my being very late to this discussion, but I just wanted some clarification: our definition of PWG is the same as described by Wikipedia? Well, I guess according to them, it doesn't really exist, but still. I'm assuming that we're not saying PWG is necessarily Frankish, but that Frankish can be considered a group of dialects of PWG. I think? DJ K-Çel (talk) 21:20, 3 April 2020 (UTC)[reply]

Order of cases is not consistent across Latin templates[edit]

I know there's been some disagreement over what order cases should go in, but surely the fact that {{la-ndecl}} and {{la-adecl}} use different orders (edit: as on Gallus) is just plain confusing, surely there should be consistency at least within a language. - -sche (discuss) 18:54, 25 December 2019 (UTC)[reply]

I agree. I vote for alphabetic order: Ablative, Accusative, Dative, Genitive, Nominative, Vocative. —Mahāgaja · talk 20:43, 25 December 2019 (UTC)[reply]

Curiously, the templates themselves (on the Template: pages) display the cases in the same order, but see e.g. Gallus, where the proper noun uses nom-voc-acc-gen-dat-abl order while the noun uses the adjective declension template, which includes display text saying it's an adjective, and goes nom-gen-dat-acc-abl-voc. - -sche (discuss) 21:07, 25 December 2019 (UTC)[reply]

Agreed as well and if there's no other compelling order, then go with alphabetical. —Justin (koavf)❤T☮C☺M☯ 21:21, 25 December 2019 (UTC)[reply]
- @Koavf: I was joking about alphabetical order. Absolutely no one wants our Latin declension tables to have anything other than Nominative first. —Mahāgaja · talk 22:14, 25 December 2019 (UTC)[reply]
  Indeed, there is a lesson here concerning voting for things one knows nothing about... —Μετάknowledge^{discuss/deeds} 07:37, 27 December 2019 (UTC)[reply]
Nom-voc-acc has my preference no matter what IE language. —Rua (mew) 22:17, 25 December 2019 (UTC)[reply]

@Benwing2, whenever you have time, can you look into why Module:la-nominal(?) presents the cases in different orders for different parts of speech in e.g. Gallus? Is this intentional?
On another note, I think the reason Gallus uses an adjective template for the noun meaning "Gaul" is because the module is a bit too clever(?) and suppresses the (attested) plural if the noun declension table is used, whereas it displays it on gallus, where the sense is duplicated (which is a separate, third problem I may try to clean up by making one of the entries use {{alternative case form of}}). - -sche (discuss) 03:20, 26 December 2019 (UTC)[reply]

The traditional treatment of cases, including their ordering, derives from their treatment by early Hellenistic grammarian Dionysius Thrax, who enumerated them in the order Nom – Gen – Dat – Acc – Voc.[9] The ancient Roman scholar Marcus Terentius Varro preserved this order in presenting Latin grammar in his encyclopedia, adding the ablative as a sixth case.[10] He also introduced the term casus, whence our case, as a semantic loan of the term πτῶσις (ptôsis) used by Dionysius. German Fall is likewise a semantic loan. Later Latin grammarians put the vocative again in the final position, so that the Latin ablative become the fifth case. The order was enshrined in German school education, where the corresponding German(ic) cases where not identified as Nominativ, Genitiv, Dativ and so on, but as 1. Fall, 2. Fall, 3. Fall, ... While I have no strong feelings about the best order – except that the nominative, being the lemma form, should be first – I think a case can be made for keeping to this traditional order. --Lambiam 12:45, 26 December 2019 (UTC)[reply]

@-sche I'm very confused, I see the same order nom-gen-dat-acc-abl-voc on every table under Gallus. I'm sure I would have noticed if the table order was different across different templates as I worked on them quite a lot. The order nom-gen-dat-acc-abl-voc is, as User:Lambiam notes, the traditional order, and the one also found in my intro Latin textbook ("Latin: An Intensive Course" by Moreland and Fleischer). This is not necessarily the best; a case could certainly be made for an order more like nom-voc-acc-gen-dat-abl, which keeps many syncretisms together (nom-voc for most nouns, nom-voc-acc for neuter nouns, gen-dat for feminine 1st decl. nouns, dat-abl for many singular and all plural nouns). Benwing2 (talk) 16:16, 26 December 2019 (UTC)[reply]

BTW the reason Gallus uses an adjective template is so that the feminine Galla can be displayed; Gallus is more or less a nominalization of the adjective gallus. This is definitely a hack and something I should probably fix. Benwing2 (talk) 16:18, 26 December 2019 (UTC)[reply]

Hmm, fascinating! This is what I see in Firefox 65.0 and Chrome 79.0.3945.88 (Windows 10) when I'm logged in. When I close both browsers and restart them, and go back to that page (while logged out), both tables have the "correct" order (or, the order that the lower table has in that screenshot), but in both browsers, the top table goes back to "nom-voc-acc-gen-dat-abl" order when I log back in. Quite odd. what order do the rest of you here see? - -sche (discuss) 22:17, 26 December 2019 (UTC)[reply]

@-sche: Heh! The reason is that you've got User:Erutuon/scripts/changeCaseOrder.js installed in your common.js – it reorders cases in noun tables but not adjective tables. — Eru·tuon 22:24, 26 December 2019 (UTC)[reply]

Aha! Thanks for figuring that out. I don't suppose that script could be tweaked to also reorder adjective tables? - -sche (discuss) 22:35, 26 December 2019 (UTC)[reply]

@-sche: Done, though it doesn't merge cells as would be nice. — Eru·tuon 23:00, 16 January 2020 (UTC)[reply]

Excellent! Thank you! - -sche (discuss) 00:00, 17 January 2020 (UTC)[reply]

"Later Latin grammarians put the vocative again in the final position" - who?

"The order nom-gen-dat-acc-abl-voc is, as User:Lambiam notes, the traditional order" - no, ...-voc-abl is. Also L&S s.v. sextus: "In gram.: sextus casus, the ablative case, Quint. 1, 4, 26."

--Sasha Gray Wolf (talk) 22:28, 26 December 2019 (UTC)[reply]

According to Latin declension#Grammatical cases, the order with vocative–ablative was the older order, but ablative–vocative was used in Allen and Greenough and in Wheelock's. I seem to remember accusative was between dative and ablative so I guess my books used the latter order. — Eru·tuon 22:46, 26 December 2019 (UTC)[reply]

Moving Arabic roots out from under "Etymology"[edit]

I was just testing out a change to {{ar-root}} that would also let it categorize contemporary Arabic varieties, not just MSA/CA, when I figured it might be a good idea to hold off for now. I think it'd be better first to see if it's possible to separate roots from etymology... as roots are theoretical and derived from words, not the other way around, and saying that a given word simply comes "from the root CCC" kind of scans like cop-out avoidance its actual etymology. Obviously, the 'real' etymology of many native Arabic verbs is rather obscure, but all that means for any such given verb is that we should give it no etymology section at all rather than putting it down as coming "from" its root.

For verbs outside of Form I that are from a noun (consider Form II verbs like لَوَّنَ (lawwana, “to color”)), this derivation should be mentioned in the Etymology section. For all other verbs of higher forms, the etymology section should simply state "Form [whatever] measure" or "Form [whatever] transfix" or something of the sort, without including "...of the root CCC". As for showing the root itself, I see no reason not to copy off of what we do for Hebrew: just move it off to the side in its own little display, so that it's still clearly visible, just not tied to etymology.

First step would be to modify {{ar-root}} with whatever alternative display style it should have (i.e. right-aligned, in a table, whatever). After that, updating entries seems like it could at least partially be a bot job, since pretty much all of these etymologies reuse the same few wordings:

Extract the template invocation from the sentence "Morphologically from the root {{ar-root|foo|bar|baz}}.", delete the sentence from the Etymology section, and restore the template above the Etymology header.
Do the same for the sentence "[Ff]rom the root {{ar-root|foo|bar|baz}}."
If the Etymology section is now empty, delete the Etymology header.
If it's not empty, put it in the category "Arabic entries with etymologies" for human review and revision.

The "human review and revision" part is daunting, but taking care of the bottable parts should cut down on that load significantly. I'm willing to put time into attending to what remains, of course. —M. I. Wright (talk, contribs) 00:49, 26 December 2019 (UTC)[reply]

Note. Many Hebrew entries also have an etymology section “From the root א־ב־ג ”; see e.g. בלש. --Lambiam 11:04, 26 December 2019 (UTC)[reply]

@M. I. Wright I agree that this whole business of listing roots as etymologies of Arabic words is rather kludgy and not linguistically sound. I think the problem is that there is no proper treatment of Arabic etymology (or Semitic etymology in general?) along the lines of e.g. Vasmer for Russian, so it's hard in general to know whether a given verb was derived from the noun, or vice versa, or both derived from some earlier Proto-Semitic (or whatever) form, and which words are early borrowings/calques/etc. (late borrowings are usually more obvious). Note that we do derive many Indo-European words back to PIE roots, which would be comparable to what's being done for Arabic except that it would use Proto-Semitic three-consonant roots instead of Arabic three-consonant roots. I am willing to help with the bot jobs. We'd need to spell out carefully exactly how we will proceed and make sure there's general agreement among the various Arabic editors. Benwing2 (talk) 16:26, 26 December 2019 (UTC)[reply]

Nahuatl (`nah`): convert etymology-only or delete?[edit]

At WT:LT#Nahuatl it says that both the macrolanguage and its subdivisions are treated as languages, but Nahuatl doesn't have any lemmas–it seems that most former entries have been converted to Classical Nahuatl (nci). It is the only macrolanguage with no entries besides Sahaptin, which has only two lemmas in its subvarieties, compared to thousands of Nahuatl subvarieties lemmas. However, there are many terms derived from Nahuatl and several translation requests. Is there any benefit to keeping Nahuatl as a full-fledged language? Note that there was a discussion in 2008 that decided to keep Nahuatl, but this was when it had entries unsorted into varieties. Julia ☺ ☆ 02:23, 26 December 2019 (UTC)[reply]

The intention, following this 2018 discussion (which links to an earlier discussion), has been to remove it from being a full language as soon as existing uses in translations tables and the like are cleaned up so that making the code into either an etymology-only code or a family code does not break a lot of entries (as it would at the moment). It's a matter of having the time and expertise to clean up such uses. One idea to expedite the process would be to convert e.g. the translations to t-checks using the code "und", i.e. * Nahuatl: {{t-check|und|cuīcayōtl}}. Then the code could be retired and there would not be the risk of people adding new entries, translations, etc with the generic code. - -sche (discuss) 03:00, 26 December 2019 (UTC)[reply]

@Lvovmauro, -sche: What do we do though with the terms in New Classical Nahuatl like Teutontlālpan? “Simulation of Classical Nahuatl” or not, do we want to ignore it completely? It would be unfair as compared to Latin though. Occurring in printed texts the terms are “attestation according to our standards”, unlike @Chuck Entz says, but the question is as what – whether for something not treated at all?

This needs to be solved because it is hard to retire codes if there is something left over; I mean when I found “Nahuatl: Teutontlālpan” in the translation-table of Germany I had no good feeling about what to do because the term does exist and also it didn’t exist. Classify all with the label “neologism”? I would be for giving the language a name (regardless of whether the name is attested: we seem to be first with the terms New-Classical Nahuatl and Neo-Classical Nahuatl here) to solve our little problem and see how the thing develops in the future, for I am not opposed to this literary language like not to modern Latin poems about computers – you won’t believe what one can attest in Latin, and why not, but I digress –, it’s not a conlang. (Similar question goes with Classical Syriac, which has compositions if not today then in the nineteenth century, but perhaps the lacking vowels in the script obscures whether a text is supposed to be the old language or a modern dialect, and there is a continuum with Assyrian Neo-Aramaic of which Neo-Syriac is allegedly a synonym. We have removed the language “Syriac” two years ago because people only misused it for terms from the original, Classical Syriac.) Fay Freak (talk) 01:23, 6 January 2020 (UTC)[reply]

I don't see any impediment to removing the code. If we view such modern use of Classical Nahautl (despite its in-this-case-unfortunate name) as being includable like Latin, and the coinages meet attestation requirements, then they can be included with {{qualifier}}s and {{label}}s like Tela Totius Terrae. Whereas, if we consider such coinages an unincludable conlang like we do modern Gothic neologisms, the words can be excluded entirely. Marrovi's comment in the RFV suggests "Neo-Classical Nahuatl" may be more like Latin. - -sche (discuss) 02:01, 6 January 2020 (UTC)[reply]

Using the built-in site search function it looks like 592 translations into nah remain (down from about a thousand in the past). I hope/intend to convert those translations to * Nahuatl: {{t-check|und|foobar}} and etymology-only-ify the language code nah sometime soon, as proposed above. - -sche (discuss) 00:21, 17 January 2020 (UTC)[reply]

Macau Pidgin Portuguese[edit]

I'd like to request the addition of Macau Pidgin Portuguese, which has not yet been assigned an ISO code. It is distinct from Macanese creole (mzs). Some sources:

Li, Michelle (2016 November) “Trade pidgins in China: Historical and grammatical relationships”, in Transactions of the Philological Society, →DOI, pages 298–314
Li, Michelle, Matthews, Stephen (2016 January) “An outline of Macau Pidgin Portuguese”, in Journal of Pidgin and Creole Languages, volume 31, number 1, →DOI, pages 141–183
Matthews, Stephen, Li, Michelle (2012) “Portuguese pidgin and Chinese Pidgin English in the Canton trade”, in Hugo C. Cardoso, Alan N. Baxter, Mário Pinharanda Nunes, editors, Ibero-Asian Creoles: Comparative Perspectives, Amsterdam/Philadelphia: John Benjamins Publishing Company, pages 263–287

--Lvovmauro (talk) 10:49, 26 December 2019 (UTC)[reply]

Pinging @Ungoliant MMDCCLXIV, who speaks Portuguese and IIRC has helped with some creoles/pidgins in the past, in case he has insight (or can read how Portuguese-language sources discuss the intelligibility or non-intelligibility of MPP vs mzs). - -sche (discuss) 22:24, 26 December 2019 (UTC)[reply]

@Lvovmauro In the absence of a response from anyone else, and because the resources I looked at, including the ones you provided above, do say this is distinct from mzs, I have added a code crp-mpp for Macau Pidgin Portuguese. If any of the data (e.g. script) needs to be changed / added-to, let me know. - -sche (discuss) 00:09, 17 January 2020 (UTC)[reply]

Hani should be added as a script, since the main source uses Chinese characters. --Lvovmauro (talk) 10:00, 19 January 2020 (UTC)[reply]

Done. - -sche (discuss) 13:36, 19 January 2020 (UTC)[reply]

Thinking of doing a bot run to reformat places using Template:place[edit]

I am thinking of doing a bot run to replace things like the following:

# {{surname|en}}
# A city in [[British Columbia]], [[Canada]]
# A village in [[Nova Scotia]], [[Canada]]
# A city in [[Arkansas]], and one of the two [[county seat]]s of {{l|en|Sebastian County}}.
# A town in [[Delaware]]
# A town in [[Florida]]
# A village in [[Illinois]]
# A city in [[Indiana]]
# A town in [[Louisiana]]
# A town in [[Maine]]
# A city in [[Minnesota]]
# A city in [[Mississippi]], and the county seat of {{l|en|Leflore County}}.
# A city in [[Missouri]]
# A village in [[Nebraska]]
# A town in [[New York]]
# A city in [[South Carolina]], and the county seat of {{l|en|Greenwood County}}.
# A city and one of two towns in [[Wisconsin]]

with:

# {{surname|en}}
# {{place|en|city|p/British Columbia|c/Canada}}
# {{place|en|village|p/Nova Scotia|c/Canada}}
# A city in [[Arkansas]], and one of the two [[county seat]]s of {{l|en|Sebastian County}}.
# {{place|en|town|s/Delaware}}
# {{place|en|town|s/Florida}}
# {{place|en|village|s/Illinois}}
# {{place|en|city|s/Indiana}}
# {{place|en|town|s/Louisiana}}
# {{place|en|town|s/Maine}}
# {{place|en|city|s/Minnesota}}
# {{place|en|city|s/Mississippi|;|county seat|co/Leflore County}}
# {{place|en|city|s/Missouri}}
# {{place|en|village|s/Nebraska}}
# {{place|en|town|s/New York}}
# {{place|en|city|s/South Carolina|;|county seat|co/Greenwood County}}
# A city and one of two towns in [[Wisconsin]]

There are two reasons to do this: (1) consistent formatting; (2) [IMO more important] proper categorization. We have categories like Category:en:Towns in Maine, USA and Category:en:Villages in Nova Scotia but they are somewhat sparsely and inconsistently populated. Doing a bot run like this will make the categories more complete. (Yes, the two categories I just named are inconsistent in their naming convention but that's another issue.) Benwing2 (talk) 03:34, 27 December 2019 (UTC)[reply]

Support. I'd be interested in seeing how wide a range of wording you could reliably re{{place}}. —Μετάknowledge^{discuss/deeds} 07:39, 27 December 2019 (UTC)[reply]

Strong

Support. Although I still don’t understand what the difference between a town or a city is (not a distinction in at least most of continental Europe) – any tips? Fay Freak (talk) 15:51, 27 December 2019 (UTC)[reply]

It is a legal, not lexical distinction, sometimes (for example w:City status in the United Kingdom). DTLHS (talk) 16:00, 27 December 2019 (UTC)[reply]

I thought so, but why or how map that or any distinction then onto other places? Isn’t there a neutral term? Most languages/countries I know have a two-fold distinction. German Dorf against Stadt, Polish wieś against miasto, Russian дере́вня (derévnja) or село́ (seló) (originally a distinction between a settlement without and with a village church, but now arbitrarily used) against го́род (górod). The categories town and city are used arbitrarily and thus inconsistently by foreign language editors. The factual, not legal, distinction is between rural and urban life. Fay Freak (talk) 16:09, 27 December 2019 (UTC)[reply]

I

Support these changes, and if they affect countries as well, I'd like to raise an issue with categorization. Currently, the place template categorizes into xx:Countries in [continent]. I think it should also categorize into xx:Countries because I find it helpful to see a complete list of countries, and because continent divisions can be subjective, as a recent Tea Room post shows. Ultimateria (talk) 15:51, 28 December 2019 (UTC)[reply]

Also, capital city doesn't seem to categorize into Category:Cities in [country]. Ultimateria (talk) 17:54, 29 December 2019 (UTC)[reply]

Another observation: When the FL place is identical to English, and t1=<term> is used, clicking on the place link in read mode will open the entry in edit mode instead of jumping to the English section. Panda10 (talk) 19:00, 29 December 2019 (UTC)[reply]

@Benwing2 Can you address these concerns since you're working with the module? At least the city issue; I don't know if anyone would disagree with the country categorization. Ultimateria (talk) 23:20, 10 January 2020 (UTC)[reply]

I think the template is a little too simplistic or formulaic for country definitions. When I had edited Mozambique to add a couple of senses, I attempted to account for the changing forms of the country in a single sentence without writing a paragraph. The modern definition of that country would have only applied since 1990. But the term Mozambique would have been used before then. -Mike (talk) 10:03, 30 December 2019 (UTC)[reply]

@Ultimateria I'll try to address these issues soon. Currently the way categorization works is it stops categorizing as soon as it finds a suitable category. I need to modify Module:place so there's the option of continuing after a category is found; then I can make 'capital city' categorize both into 'LANG:Capital cities' (as it does currently) and 'LANG:Cities in COUNTRY', and make 'country' categorize into both 'LANG:Countries in CONTINENT' and 'LANG:Countries'. Benwing2 (talk) 05:26, 13 January 2020 (UTC)[reply]

@Panda10 I can't duplicate your issue. I tried this in the Middle French section of France, and clicking on "France" just puts me in the English section, as I'd expect. Maybe some sort of JavaScript extension you have is causing this? Benwing2 (talk) 05:26, 13 January 2020 (UTC)[reply]

@Benwing2 It works fine now when the entire page is in read mode. But it still has this behavior when I am editing a FL section. I usually click "Show preview", then right click the English definition to open it in a new browser window to see if I used the template correctly before I save my changes. But maybe this is normal behavior. Thanks. Panda10 (talk) 18:14, 13 January 2020 (UTC)[reply]

@Ultimateria The double categorization of 'capital city' and 'country' should be implemented now. Benwing2 (talk) 07:27, 13 January 2020 (UTC)[reply]

Thank you! Now I won't hesitate to use this template. Ultimateria (talk) 17:28, 13 January 2020 (UTC)[reply]

@Ultimateria, Donnanz, Moverton I created an alternative syntax for {{place}} that's much more flexible. An example is this:

{{place|en|An Anglo-Saxon <<kingdom>> in northeastern <<c/England>> and southeastern <<c/Scotland>>.}}

You just write arbitrary text and surround the placetypes and holonyms with <<...>>, and it categorizes properly. In my view it's still preferred to use the old syntax if it makes sense, because it enforces more consistency on the results, but the new syntax can be used if it's not easy to express everything the old way. If you have better suggestions for the delimiters to use, I can implement them. I didn't want to use single <...> because that interferes with the use of HTML tags, which get inserted automatically by some templates, which I wanted to allow people to use in the text. Benwing2 (talk) 17:37, 18 January 2020 (UTC)[reply]

lb descendant of gmw-cfr? wym descendant of gmw-ecg?[edit]

I made a script to find missing |bor=1 notations in sets of descendants. This revealed several cases where lb (Luxembourgish) is listed as a daughter of gmw-cfr (Central Franconian), and several cases where wym (Vilamovian) is listed as a daughter of gmw-ecg (East Central German). In reality, both lb and wym have gmh (Middle High German) as the parent. Should they be changed? Benwing2 (talk) 04:38, 28 December 2019 (UTC)[reply]

It's odd to see languages that are spoken today being referred to as parent and child to each other: Central Franconian and East Central German are descended from Middle High German as well. My impression is that Luxembourgish is within the Central Franconian dialect continuum in terms of historical development, even though it's considered to be a language in its own right. Whether that makes it a descendant is an interesting question. Chuck Entz (talk) 05:39, 28 December 2019 (UTC)[reply]

Certainly Dutch and Afrikaans are parent and child?--Prosfilaes (talk) 21:19, 28 December 2019 (UTC)[reply]

We also consider Indonesian to be a child of Malay, and creoles and pidgins to be children of their lexifier languages. —Mahāgaja · talk 18:30, 29 December 2019 (UTC)[reply]

Creoles should not be considered children of any language; see these discussions concerning Haitian Creole and the English-based creoles of Suriname. --Lambiam 20:14, 29 December 2019 (UTC)[reply]

So what to say about an Ausgleichsdialekt and an Ausgleichssprache then? Germans from multiple areas came at multiple times in trains to Romania or in the eighteenth century to Russia and hybridized new languoids. I think it’s really like in genetics. We are observing nothospecies. For reasons of practicability we refrain from calling them in the × format. Fay Freak (talk) 01:13, 30 December 2019 (UTC)[reply]

I'm not saying that having two living languages as parent and child is inherently wrong- just that one would expect some compelling historical reason for it. Here we apparently have dialect continua with some of the dialects developing into separate languages alongside the others. Granted, Malay/Indonesian seem to have come to be treated as separate languages under similar circumstances as with Central Franconian/Luxembourgish ("A language is a dialect with an army and navy"), but that doesn't explain East Central German and Vilamovian. Of course, I don't know as much as I should about these languages or their histories, so I'm open to someone enlightening me. Chuck Entz (talk) 00:47, 30 December 2019 (UTC)[reply]

What happened with Afrikaans is the founder effect, hence why there is a child of a living language. It could also have appeared by Luxembourg being a separate state but probably it hasn’t been that separate. Fay Freak (talk) 01:13, 30 December 2019 (UTC)[reply]

fixing category names referring to the US and North Macedonia[edit]

We have Category:United States of America (and corresponding language-specific categories) as well as the following types of categories that mention the US:

Two of the categories say "United States" and all the rest say "United States of America". I'd like to fix this. The path of least resistance is to standardize on "United States of America", but I suggest we should maybe standardize on "United States". This form is shorter, unambiguous and more consistent with category naming elsewhere, which prefers short-form countries where possible (see below). Note that Wikipedia uses United States, and in general prefers short forms, although there are occasional exceptions (see also below).

Another issue:

Nowhere else do we use the long form of a country's name except in Category:Republic of the Congo, Category:Democratic Republic of the Congo, Category:Czech Republic and republics inside of Russia (Category:Republic of Karelia, Category:Sakha Republic). All of these others are special cases where the word "republic" reasonably belongs there, but I see no reason why the categories referring to North Macedonia shouldn't just say Category:North Macedonia etc. We have Category:Ireland, Category:Armenia, Category:Georgia, Category:China, etc. not Category:Republic of Ireland, Category:Republic of Armenia, Category:Republic of Georgia, Category:People's Republic of China, etc. I suspect the "Republic of North Macedonia" naming is a holdover from the previous "Republic of Macedonia" categories, where the long-form naming made more sense. Benwing2 (talk) 05:51, 30 December 2019 (UTC)[reply]

- I agree with settling on "United States" and "North Macedonia" in these category names. (Incidentally, the full official name of Ireland is "Ireland". "Republic of Ireland" is not the official name, just a description that is used when necessary to distinguish the part of the island of Ireland that is a republic from the part that's a monarchy.) —Mahāgaja · talk 08:52, 30 December 2019 (UTC)[reply]

I also agree with the shorter names. Ultimateria (talk) 17:02, 31 December 2019 (UTC)[reply]

You could also shorten Category:Alaska, USA and all the others except Category:Georgia, USA (which can be confused with the country) back to what they used to be. DonnanZ (talk) 17:15, 31 December 2019 (UTC)[reply]
I do agree with the shorter titles proposed above, and as for CDPs (census-designated places) in instead of of makes sense to me. DonnanZ (talk) 17:35, 31 December 2019 (UTC)[reply]

I did both renamings. I will create a separate topic about categories like Category:Alaska, USA. Benwing2 (talk) 16:49, 1 January 2020 (UTC)[reply]