Wiktionary:Beer parlour/2020/June

Proposal for a Latin "super-category" for etymologies?

So hear me out on this one. Since there has been more effort lately to be accurate and particular when sorting words into categories by their etymology (and I'm particularly discussing inherited terms in this post), I understand why it's been separated into distinct Late Latin, Vulgar Latin, Medieval Latin, etc. categories vs. just the "Latin" one. However, isn't that plain Latin category technically just indicative of Classical Latin? And if not, could we come to a consensus or clarification on this? In the past, I'll admit I sometimes treated it as a super-category, catch-all for any kind of (inherited) Latin term, even if the Romance descendant was not directly derived from Classical Latin (which was written/attested), but came through a Late or Vulgar Latin intermediate, or wasn't even technically present or attested in Classical Latin. How should we treat these though? Technically speaking, ALL inherited Romance terms had to have come through the Vulgar, or spoken, Latin by definition, but we don't always indicate that in the etymology unless there is a distinctly different reconstructed VL. form to notate. In fact, it's even debatable whether Classical Latin could be said to be an ancestor to Romance, strictly speaking, as it was the prestige register of the language versus the one used by the masses. I suppose a similar problem exists with Prakrit and Sanskrit.

So we can use an example like the descendants of camisia, which is technically not considered part of Classical Latin and thus its descendants would not get the "inherited from Latin category" on their entries... However, I feel that it would be useful to have a category or list or compendium of all the inherited Latin lexicon of each Romance language (regardless of whether the parent term ultimately came from Late, Vulgar, or Classical Latin). I feel like this wouldn't be technically difficult to do: just have any entry using "inh" with either Vulgar, Late, or plain Latin into this category. I suppose some may nitpick about Late Latin just being a later attested form of Latin and not the ancestor of Proto-Romance, unlike Vulgar Latin, which I agree is accurate. But I'm not sure how to handle that. Also, I don't know what this category would actually end up being labeled, but I'm open to suggestions if anyone is interested in this idea.

I also want to point out that Latin entries on Wiktionary, regardless of what stage of the language they were in, are still just labeled "Latin". Word dewd544 (talk) 00:14, 3 June 2020 (UTC)[reply]

Nothing? To sum it up if that was too long, I basically want to standardize usage of Latin in etymologies to have a consistent meaning across them. And I'm wondering why terms derived from a Latin word with the Latin heading, even if it was Late Latin, don't get the category inherited from Latin? Word dewd544 (talk) 01:05, 6 June 2020 (UTC)[reply]

I definitely agree with you, although I don't have much investment in this, not being a frequent editor of Latin. It certainly seems like a logical way of organizing things to me. Andrew Sheedy (talk) 03:57, 6 June 2020 (UTC)[reply]

Perhaps part of the reason that this proposal hasn't gotten much of a response (besides being somewhat prolix) is that the problem does not seem to have much to do with the particulars of Latin, but rather that all entries in "X terms inherited from Y variety of Z" should also categorise into "X terms inherited from Z", which is probably not too controversial, but I don't have the technical know-how to fix it. Maybe @Erutuon? —Μετάknowledge^{discuss/deeds} 04:06, 6 June 2020 (UTC)[reply]

Yes, for me it was because I did not discern the point in the prolixity. I assent to the “categorization of X terms inherited from Y variety of Z into X terms inherited from Z”. I disagree with his statement that ”Latin” without adjective means Classical Latin. It of course only means what it means. But I had many cases where I would rather see a category of derivations from a language without adjectives without pages being hidden because someone was more specific in the etymology templates – the current behaviour also deters from using etymology codes in the first place. Fay Freak (talk) 16:22, 6 June 2020 (UTC)[reply]

Yeah, this is the general behavior of the etymology templates. If you put in an etymology language code, the template adds the category for that etymology language, not the categories for the parent etymology languages (if there are any) or the category for the parent language or language family. For instance, machine is in Category:English terms derived from Doric Greek but not in Category:English terms derived from Ancient Greek, though the parent language of Doric Greek is Ancient Greek. This could be changed by editing Module:etymology. — Eru·tuon 22:22, 7 June 2020 (UTC)[reply]

Formatting Chinese articles

Discussion moved to WT:Grease pit/2020/June#Formatting Chinese articles.

French Wiktionary communiqué

Dear colleague,

I am glad to inform you that the French Wiktionary had just published a communiqué. It's a synthesis of the last 12 months new entries in Wiktionary. It's only in French, but you can help to translate it, if you want. I think it is important for you to know that we try this strategy. Strategy? Yes.

In France, the two most popular dictionaries editors publish this kind of communiqué each year and a lot of journals write articles about those new entries in Larousse and Le Robert. Almost hundred of words are added by them, because their editorial choices is to pick only very few words. A scientific team have made a useful comparison of those words and the date of inclusion in French Wiktionary. For French in French Wiktionary, it's more than 14.700 words for the past 12 months. So, for the first time, we decided to challenge the status quo and to fight on the same ground. So, we imitate the form of those communiqué and we will try to have the same impact on press. I will let you know if it works, and if any of you have a contact in French press, feel free to share the communiqué! Noé 09:24, 5 June 2020 (UTC)[reply]

@Noé: Very nice, good luck! I like bricodeur / bricodeuse. – Jberkel 10:21, 5 June 2020 (UTC)[reply]

Thanks! We had already some good feedbacks by linguists and lexicographers, so I think it's already quite a success. Then, we'll see if the press get caught! bricodeur is a fusion of bricoleur (tinkerer) and codeur (coder), so... tinkoder in English?

Noé 14:58, 5 June 2020 (UTC)[reply]

French mèmification is translated to English memification[1], which is in Urban Dictionary but neither form is in this one. According to a few moments of searching the oldest dated use is by The Federalist[2] in 2015. (Is it irony that a conservative web site is leading language change?) Probably minutes instead of seconds would find older uses. Vox Sciurorum (talk) 17:37, 5 June 2020 (UTC)[reply]

We have memeification. Ultimateria (talk) 00:18, 6 June 2020 (UTC)[reply]

Added the short form (which I like better, easier to pronounce) as an alternative spelling. Vox Sciurorum (talk) 10:09, 6 June 2020 (UTC)[reply]

And added the French form, as feminine contrary to French Wiktionary. Vox Sciurorum (talk) 10:17, 6 June 2020 (UTC)[reply]

Vulgar Latin pronunciations for New Latin Entries

An IP user, 69.125.80.42, has spent the last day or so adding |vul=yes to {{la-IPA}}, including in a number of New Latin entries. Is this something we want? Chuck Entz (talk) 18:52, 5 June 2020 (UTC)[reply]

Yeah, no. Mass revert. --{{victar|talk}} 08:11, 6 June 2020 (UTC)[reply]

Do we have a gadget to allow users to "opt in" to seeing all these pronunciations for all Latin entries which have {{la-IPA}}? - -sche (discuss) 03:47, 8 June 2020 (UTC)[reply]

That would be complicated because the Vulgar Latin transcription isn't in the HTML when |vul=yes isn't present. The gadget would have to find {{it-IPA}} in the wikitext, add the parameter, expand the template, and insert the result into the page. — Eru·tuon 05:48, 8 June 2020 (UTC)[reply]

Should Rhymes pages link to Wikipedia articles on rappers, etc?

At Rhymes:English/ɒnɪk and some other pages, @Paul G has added links to e.g. Wikipedia pages on specific people, who would never get dictionary entries here (as well as links to Wikipedia articles on surnames instead of linking to entries here). How do other people feel about this? I know some people (DTLHS?) have proposed switching away from rhymes pages entirely and using categories, which I support (and which would preclude linking to rappers' Wikipedia articles). - -sche (discuss) 22:39, 7 June 2020 (UTC)[reply]

I would think we would want a red link for any proper noun we didn't have an entry for. Alternatively, someone could create a system (such as those built around {{taxlink}} and {{vern}}) to track such items and add them in due course. DCDuring (talk) 23:53, 7 June 2020 (UTC)[reply]

I think we should only include terms on the rhymes page if they are inclusion-worthy on Wiktionary according to our criteria. Andrew Sheedy (talk) 00:45, 8 June 2020 (UTC)[reply]

Yes, that's what I think: the links should be to our entries (including if they're redlinks) not links to Wikipedia, and we shouldn't link to entries we wouldn't include, like rapper and album names. - -sche (discuss) 03:46, 8 June 2020 (UTC)[reply]

I agree: don't include them. Equinox ◑ 03:57, 8 June 2020 (UTC)[reply]

It is common knowledge that the illest rappers used Wiktionary's rhymes pages to come up with some dopeshit phat fleek lyrics and clapbacks. Often I have no idea what they'z talmbout. However, if we want to come correct, we should probably delete them. Word is bond. whatchu think? --Huckerby980 (talk) 13:08, 8 June 2020 (UTC)[reply]

Do we need a vote? Paul G is reverting changes I made in accord with the sense of this discussion. I don't see anything wrong with redlinks, other than that it is a bit difficult to generate lists of them, especially with counts of how often the redlinked would-be entry is linked to. I wouldn't know how to expeditiously find the pedia links (w:) from the XML dumps, especially not on Rhymes pages. DCDuring (talk) 21:57, 20 June 2020 (UTC)[reply]

Paul G on your talk page protests that most surnames will be red links here: that is an argument for us to create the missing surname entries, not an argument for us to keep random encyclo-topic rhymes. Equinox ◑ 22:00, 20 June 2020 (UTC)[reply]

Seems fair. I'm glad this has been raised, actually, as this approach makes less work for me.

For now, I will continue to add Wiktionary material and common surnames, and link to Wikipedia only in rare cases. — Paul G (talk) 15:33, 27 June 2020 (UTC)[reply]

WT:ATTEST does not mention how to deal with regional labels

If someone add label "Australian" to words that are only used in "New Zealand", is citation needed to confirm this claim? And how many citations needed? Three for Australian, three for New Zealand? Or total three citation for Australian and New Zealand? What if someone add two for New Zealand, one for Australian? WT:ATTEST does not set any rules for regional labels, so how to deal with situation where someone suggest word from New Zealand also used in Australia? This is to prevent people from making assumptions such as "This word is from New Zealand, but I sometimes hear it being used by my friends from Australia".

How about more complicated situation like word that has four labels "Pakistan","India", "Middle East", "South Africa" (as an example). Three citations needed for each region? Or total three citation for the same meaning? This looks like loophole where people can claim that word is used in many regions without proof of attestation for each region. User talk:iambluemon 11:30, 8 June 2020 (UTC)[reply]

Dictionary-only entries for less documented languages

I remember back in 2014 many entries in archaic Korean were deleted because they lack citations. Back then, I thought the standard was good because entries copied from other dictionaries is not accepted if there are no citations. However, now in 2020, it seems the treatment is different. Editors can now say that "It's found in Dictionary ???, and it is reliable dictionary, so it is considered cited". I think this is dangerous because other dictionaries also make mistake. Is Wiktionary copying mistake made by other dictionaries? Is Wiktionary slowly becoming a translation hub for other dictionaries? Even the reputable OED also has entries in Appendix:English dictionary-only terms. So how can we be sure that Wiktionary entry copied from other dictionaries is reliable? How can we inform people that the entry is copied from other dictionary, but actual citation is not yet found? Based on current practice, would the archaic Korean entries created in 2014 still be deleted? Since only one citation is needed for less documented languages and some language editors consider dictionaries as citations, can anything be done to prevent falling into trap of dictionary-only entries? User talk:iambluemon 11:50, 8 June 2020 (UTC)[reply]

For LDLs one citation or mentioning is enough. (WT:CFI adds "editors [...] should maintain a list of materials deemed appropriate" but that's often not happening.) If Archaic Korean (code?) is a separate language (at WT) and not part of Korean (ko), then indeed, one mentioning in a dictionary can be sufficient. As Category:Old Korean language shows, Old Korean (oko), Middle Korean (okm) and [New] Korean (ko) are accepted languages; Early New Korean or archaic New Korean aren't. --Der Zeitmeister (talk) 17:43, 14 June 2020 (UTC)[reply]

Tennyson's quotes

Just thought I'd let you know that all of the Tennyson quotes have been dated. There were 210 of them. --Huckerby980 (talk) 13:10, 8 June 2020 (UTC)[reply]

OK, so that is actually a lie. All the Tennyson quotes have now got a title and are in a template, but most of the RQ:Tennyson templates are undated. If anyone would like to help date those works, it'd be most appreciated. --Huckerby980 (talk) 13:12, 8 June 2020 (UTC)[reply]

@Huckerby980: thanks for your industry. I do want to say that you should do a bit more research and try to find out in which book a particular poem was first published. It is not terribly helpful to create one template per poem when most of them were actually published in anthologies of poems for which we already had quotation templates. — SGconlaw (talk) 15:46, 27 June 2020 (UTC)[reply]

Can we finally update the Idiomatic phrase "Snitches get Stitches" with some proper synonyms?

Hi,
I would like to get more information and understand why there is only one synonym added to the page https://en.wiktionary.org/wiki/snitches_get_stitches#Synonyms. That synonym (snitches get stitches and wind up in ditches (rare)) includes the phrase "snitches get stitches" which is obviously the entire phrase it is supposed to be synonymous with. In my opinion, a synonym shouldn't include the entire phrase it is synonymous with, or it should be under the derived terms section. Where I grew up, we used a couple of other phrases that had the same meaning as snitches get stitches. How popular do synonyms to idiomatic phrases have to be to get considered as a synonymous phrases? Would a synonymous phrase used mostly in the Northeast be considered, or simply need the distinction of (rare) next to it since it is relegated to a specific geographical region of the country/world? Should the entire country/world understand the idiomatic phrase to be common, for it to be considered? I am trying to figure out the selection criteria undergone when allowing edits. I would like to be clear on the rules so I can suggest other, and accurate, synonyms for other idiomatic phrases that may be used more heavily in certain geographic reasons. The two phrases that were popular on the west coast are listed below.

Narc swim with sharks
Tattletales get coffin nails

Thank you for your feedback and I look forward to continuing to learn more on this subject.

Idiomaticrarity (talk) 01:32, 9 June 2020 (UTC)[reply]

Please have a look at our criteria for inclusion. If the use of a word or idiomatic phrase can be attested in durably archived sources (such as books or newspapers; blogs do not count) it can be included. It does not have to be particularly popular (or have to have been for terms that have fallen into disuse); just three attestations will usually suffice. We also include words and phrases that are used only regionally, as long as their use can be attested. --Lambiam 19:24, 9 June 2020 (UTC)[reply]

Homophones with different stress patterns

therefor reads: "Homophone: therefore (first syllable stressed)". Do homophones include words even if with different stress pattern ? --Backinstadiums (talk) 11:16, 10 June 2020 (UTC)[reply]

I would say no. --Lundgren8 (t · c) 11:51, 10 June 2020 (UTC)[reply]

I also say they are not homophones if the stress pattern is different. Vox Sciurorum (talk) 14:22, 18 June 2020 (UTC)[reply]

French 8-syllable words

Well over 90% of the listings I see for French 8-syllable Words are not 8-syllable words but 8-syllable phrases. Why is an 8-syllable phrase called an 8-syllable word? Wombash (talk) 16:13, 10 June 2020 (UTC)[reply]

Not limited to French. Apparently adding hyphenation causes the phrase to be categorized by syllable count. I think hyphenating the phrase a closed mouth gathers no feet (for example) is not useful, and classifies it as a "7-syllable word" which is bad. Hyphenate gather instead. Vox Sciurorum (talk) 22:48, 17 June 2020 (UTC)[reply]

Requesting permission to edit page 鷿鷉 by stripping L3 heading

I would like to remove what is on the page 鷿鷉 and replace it with the following content:

==Chinese==
{{zh-see|鸊鷉|v}}

This is because 鷿鷉 is considered an identical variant of 鸊鷉. The latter form is the one used in Chinese-language taxonomic databases such as 臺灣物種名錄 (Taipei) and 中国鸟类数据库 (Beijing).

Currently this triggers a filter that prevents the edit and asks me to request the permission. --Frigoris (talk) 15:23, 12 June 2020 (UTC)[reply]

I made the edit. -- Huhu9001 (talk) 17:48, 12 June 2020 (UTC)[reply]

Ojibwe (and other Algonquian) worb-stem formation ("primary derivation") and parts of speech

Apologies if this is not the right venue for this discussion. Please help me move it to the appropriate place, if necessary.

Ojibwe - like all Algonquian languages - has a particular type of word stem formation, called in the litterature "primary derivation." This is not the same as "derivation" as applied to Indo-European languages (which also exists in Ojibwe under the name "secondary derivation"). To render the concept correctly here would take some new part of speech labels and, later, some templates for proper categorization. Here is a description of the phenomenon that i have cut and paste from an upcoming book:

Primary stems are formed from three components, none of which can stand alone as a stem: an INITIAL, which expresses the core lexical meaning of the stem, an optional noun-like MEDIAL, and a FINAL, which marks the grammatical category of the stem. (Reference, p. 13)

A good example is here - look at the etymology. It is hard to overstate the importance of primary derivation in the creation of many (most?) words in Ojibwe.

I have already reorganized the existing entries for these components (or elements) into a category with temporary part of speech labels. I propose to now relabel the parts of speech as above: initial, medial, final; as the current labels: root, affix, suffix really aren't accurate in the context of Ojibwe. So two things:

I am hoping that this relabelling is ok, and that it won't get "cleaned up" as the help pages suggest could happen.
Eventually, i would ask for some help in creating templates that can help create the etymologies and categorize the stems according to which element appears in them. This last part - stem categorization - is an integral part of navigating the Ojibwe lexicon that is really hard to render in a paper dictionary but could be very powerful in a Wiki.

Thanks for your thoughts, advice and help. SteveGat (talk) 15:27, 12 June 2020 (UTC)[reply]

Um, Eye Dialect anyone?

May I encourage anyone who supports the "broad" use of the "eye dialect" label – that is, its use for nonstandard spellings representing nonstandard pronunciations, such as anyfink for anything – to comment at Wiktionary_talk:Votes/pl-2020-04/Use_of_"eye_dialect"_label#Reconsideration_of_voting_options. The present state of this long-running saga is that I am looking for clarification from such supporters about which nonstandard spellings representing standard pronunciations they would also like to include as "eye dialect". The categories identified so far are as follows, but please feel free to identify others:

2. Nonstandard spellings that represent standard pronunciations, but imply that the speaker generally uses a nonstandard dialect, such as sed for said.

3. Nonstandard spellings that represent standard pronunciations and imply some comment on a topic, such as, in certain contexts, wimmin for women, or educashun for education.

4. Nonstandard spellings that represent standard pronunciations, but have no especial connotations w.r.t. speaker or topic, such as lite for light.

My understanding is that supporters of the "narrow" definition of "eye dialect" wish to restrict it to (2). Whether supporters of the "broad" definition wish to include any others is not clear to me. If this cannot be pinned down then I will have to be vague or make up something myself, but I would prefer the voting options to be precise and also to reflect what "broad" supporters actually want. If you have any opinion, please reply at Wiktionary_talk:Votes/pl-2020-04/Use_of_"eye_dialect"_label#Reconsideration_of_voting_options. Thanks. Mihia (talk) 22:37, 13 June 2020 (UTC)[reply]

Transliteration of Sanskrit L

We currently transliterate both the vocalic (syllabic) and retroflex Ls in Sanskrit as ḷ - presumably, we want to preserve the bijection between sounds and symbols, and should therefore reassign one or the other. The obvious moves are to either render the former as l̥, or the latter as ḻ, if we care about precedent in the literature; the former has the advantage of clearly distinguishing how we mark syllabicity from retroflexion, and in itself requires very few changes, since syllabic l is very rare, although it also implies we should render syllabic r as r̥.

@Bhagadatta, AryamanA, Mahagaja, Victar, JohnC5, RichardW57

Hölderlin2019 (talk) 07:15, 15 June 2020 (UTC)[reply]

Re-pinging @Bhagadatta, AryamanA, Victar, JohnC5, RichardW57 because you have to have your signature in the same paragraph as the ping for it to work. To the question at hand, I think we can get away with transliterating both ळ and ऌ as ḷ not only because they're both extremely rare, but also because they're in complementary distribution: syllabic /l̩/ occurs only between consonants, and retroflex nonsyllabic /ɭ/ never occurs there. —Mahāgaja · talk 07:41, 15 June 2020 (UTC)[reply]

@Hölderlin2019, Mahagaja: I am for shifting to l̥ and r̥ for the syllabic letters in Sanskrit because retroflex r is also ṛ in many Indo-Aryan language transliteration schemes, so any reduction of confusion between the two sets is good for users. Mahagaja does raise a fair point however, but at the end of the day I think disambiguating the two certainly doesn't hurt. —Aryaman^A ^{(मुझसे बात करें • योगदान)} 20:44, 15 June 2020 (UTC)[reply]

I am inclined to favor this, because it also regularizes the use of the underdot even within Sanskrit to exclusively denote retroflexion (as in ṭ ṭh ḍ ḍh ṇ ṣ), brings Sanskrit into conformity for our practice for Proto-Indo-European, Proto-Indo-Iranian, and Proto-Indo-Aryan, all of which use the underring to denote syllabicity, and also reflects the preponderance of the academic sources (that is admittedly an impressionistic judgment). Frankly, are there any good arguments against making the change/for keeping things the way they are? Hölderlin2019 (talk) 01:04, 16 June 2020 (UTC)[reply]

Are you? Oh, I actually don't support this. ṛ is far more then standard for Sanskrit transliterations than r̥ is. --{{victar|talk}} 04:06, 16 June 2020 (UTC)[reply]

I agree with Mahāgaja - there's no real issue of ambiguity, though there is at least one bad Sanskrit word whose transliteration starts aṛ-. I don't get the feeling that the circle below has displayed the dot below. Additionally, I think most English L1 speakers looking up Sanskrit aren't interested in modern Indian languages, so the difference in schemes for Sanskrit and Hindi matters little. --RichardW57 (talk) 00:02, 16 June 2020 (UTC)[reply]

Yeah, Mahagaja makes a good point. Or else we may have to change the vocalic r to ŕ so that we can change the vocalic l to ĺ. But that would be going against the convention because IAST uses a dot below the r. Also,ऌ exists in no real word in Sanskrit. -- Bhagadatta (talk) 07:46, 4 July 2020 (UTC)[reply]

Topic categories linking to the singular instead of the plural

Some topic categories (e.g., Module:category tree/topic cat/data/Human) in their descriptions link to the singular instead of the plural by having "description = "{{{langname}}} terms related to [[(word)]]s"," instead of "description = "default"," (i.e., "[[(word)]]s" instead of "[[(words)]]") but it is not consistent. Should they link to the singular or the plural? J3133 (talk) 02:19, 16 June 2020 (UTC)[reply]

I'd say they should link to the singular, because in almost all cases that's where the definition is. If I see "English terms related to trees" and want to find out what a "tree" is, I'd want the link to take me to the page that defines the term for me, not the page thus just says "plural of tree". —Mahāgaja · talk 06:27, 16 June 2020 (UTC)[reply]

I've just gone through the data modules for places, earth, and plants. Although there were a good number of instances where the link should be to a singular (lemma) entry but was to the plural form, there were also fewer, but many, instances where the plural was the lemma. Automated or mindless manual approaches are likely to be partially unsatisfactory.

There are other inconsistencies, of lesser import, such as whether a label should read switchgrass or switchgrasses (in both cases linking to switchgrass). DCDuring (talk) 11:42, 16 June 2020 (UTC)[reply]

Standard (wiki) orthography for lesser-documented languages

Some languages just don't have an orthography, for example Zou. And this means that if multiple users want to work with such a language, they will probably use different orthographies, resulting into chaos, where one lemma can be found in multiple phonetic or pseudo-phonetic alphabets. Maybe it would be better if a wiki-standard for such languages is found (for example, for Zou I have used the orthography given on omniglot, while with Khumi Chin I use the special orthography designed in the paper. Both are based on phonology, yet another researcher may use another type of orthography. So basically: maybe we could add a standard orthography (either for all languages or per language in the language information at the category) so that it is actually readable and searchable. Thadh (talk) 08:42, 16 June 2020 (UTC)[reply]

What I would recommend is creating the pages Wiktionary:About Zou and Wiktionary:About Khumi Chin and explaining there which orthography is being prioritized at Wiktionary. Attested spellings in other orthographies should also be added and marked {{alternative spelling of}}. —Mahāgaja · talk 12:06, 16 June 2020 (UTC)[reply]

I guess that solves the problem in the short term (thanks!), but wouldn't it be easier in the long term to decide upon a standard phonetically-based orthography for different groups of languages (for example, one for the unwritten Sino-Tibetan languages, and another for the unwritten Trans-New Guinea languages), so that it would be more organized? Thadh (talk) 09:28, 17 June 2020 (UTC)[reply]

No, because each language has different traditions. If we did what you suggest we run strong risk of inventing our own orthography for these languages, which isn't our job. —Mahāgaja · talk 05:38, 18 June 2020 (UTC)[reply]

Yes and no. Normalisations do happen and it's a good practice. For example, it's very hard to find comprehensive sources in Old East Slavic or Old Church Slavonic. The limited material may be written in all possible orthographies and different glyphs. A one common problem is letter ꙑ, which may be written as ы. See other commonly normalised letters in Wiktionary:About Old East Slavic. I'm sure you're familiar with some of these. Don't get me started on [[palochka]] spellings out there, which is quite a mess, LOL. Look at Arabic dialects, which are often still written in romanised chat alphabet. As a dictionary, we can use standard or standardised forms and Unicode recommendations. Come up with a standard to the best of our knowledge. Writers out there, even dictionary publishers, may be limited in their tools, knowledge or not care enough about the use of the right spelling. It's a completely different story with well-documented languages where "right" and "wrong" spelling is much less ambiguous. --Anatoli T. ^{(обсудить}/^вклад) 09:30, 18 June 2020 (UTC)[reply]

What I'm more concerned about is the languages which aren't written down at all. If a speaker of Namuyi wants to write something down, they won't use his own language, they'll use Chinese. That is the problem I'm facing, and that is also why I suppose most linguistical papers describing the languages come up with their own orthography based on the phonology. Thadh (talk) 10:08, 18 June 2020 (UTC)[reply]

I've also seen cases where papers on languages simple don't care about native orthography. I have a volume on Cherokee magic where the author talks about the texts being written in the Cherokee syllabary, which he then had someone read out loud so he could write it down in the world's most overdone Latin orthography. I suspect a lot of cases there has been writing in languages--notes in ad-hoc scripts derived from whatever writing systems are in the area--but linguists don't care about writing, especially the ad-hoc orthographies created by amateurs.--Prosfilaes (talk) 00:30, 27 June 2020 (UTC)[reply]

Conjugation table for English verbs

What is the view about the elaborate conjugation table at run#Conjugation? Is this something that we should be aiming to include for all English verbs, or is it overkill? Mihia (talk) 00:27, 18 June 2020 (UTC)[reply]

It hasn't even got runnest, runneth and ranst. --Nueva normalidad (talk) 00:31, 18 June 2020 (UTC)[reply]

and also no thou wouldst have run. --Lambiam 19:20, 18 June 2020 (UTC)[reply]

I would just remove it. We have all the forms in the headword. SemperBlotto (talk) 09:37, 18 June 2020 (UTC)[reply]

Is there a way of discovering how widely this template is used? I don't remember seeing it at other entries ... Mihia (talk) 13:03, 18 June 2020 (UTC)[reply]

@Mihia: Special:WhatLinksHere/Template:en-conj. —Mahāgaja · talk 13:32, 18 June 2020 (UTC)[reply]

I see that lose_one's_balance#Conjugation needs some work! Mihia (talk) 20:18, 18 June 2020 (UTC) [reply]

I prefer inflections in a separate table over putting them in the headword line. —Rua (mew) 13:25, 18 June 2020 (UTC)[reply]

@Rua: Do you mean that you support a table such as at love#Conjugation (the corresponding table at run that I originally mentioned now seems to have been deleted), or are you referring to something simpler? Mihia (talk) 21:59, 29 June 2020 (UTC)[reply]

Way too much screen space for such a simple conjugation. Vox Sciurorum (talk) 14:16, 18 June 2020 (UTC)[reply]

Not necessary and not helpful for English. I've gone ahead and removed it at lose one's balance, because it's wrong. Unless we use such tables to include archaic forms (clearly marked as such), I am opposed to pretending that English has just as complex a conjugation system as Latin.... Andrew Sheedy (talk) 01:12, 19 June 2020 (UTC)[reply]

OK, maybe we will end up deleting all of these, but while it's still there, also enjoy give_a_shit#Conjugation! Mihia (talk) 18:07, 19 June 2020 (UTC)[reply]

I have been adding a separate conjugation table using {{en-conj-simple}} only if there are archaic forms (walkest, walketh, walkedst). If there is a modern variant of one of the forms (tunnelled, tunneled) I’ve generally added this to the headword. — SGconlaw (talk) 03:53, 30 June 2020 (UTC)[reply]

Nominated for deletion at Wiktionary:Requests_for_deletion/Others#Template:en-conj.

2030 Movement Brand Project

Hi fellows Wiktionarians!

A friendly visit, unofficial, to inform you about a Wikimedia Foundation ongoing procedure. It's a plan to change the name of the foundation to a more Wikipedia-centric vision such as Wikipedia Foundation.

In 2017, the Communications department of the Wikimedia Foundation (WMF) conducted research on the state of Wikimedia brands, with external companies specialized on brand and design. Their conclusion in 2018 was that a change is needed to reinforce the brand visibility and avoid the confusion Wikimedia/Wikipedia. The best move suggested was to change the name for something based on the best-known brand, Wikipedia.

The community reacted with a request for comments "Should the Foundation call itself Wikipedia?". It is still ongoing. Up to today there are 43 supports and 421 opposes. A summary was published in February 2020.

The WMF team made several workshops and meetings to gather feedback and concepts that may help. They planted "interconnection" as a key value of the movement. They also set six criteria to evaluate the options.

Finally, on June 16th, they revealed the proposed names for the Wikimedia Foundation:

Wikipedia Network Trust
Wikipedia Organization
Wikipedia Foundation

They initiate a two-weeks survey to June 30th. As contributors, you are welcome to give your opinion in this survey!

To reinforce our collective answer as Wiktionarians, we can use our (almost useless but) Tremendous Wiktionary User Group, an affiliate solicited by the Wikimedia Foundation through a dedicate survey I am eager to complete with the opinion of more Wiktionarians as possible. So, the group is open if you want to join it, and you can share your opinion with other Wiktionarians. This group aims as reinforce the inter-lingual cooperation between Wiktionaries. A difficult task when everyone is already busy on other tasks, but a noble goal, I think. So, if you want to build something positive rather than just be angry with the Foundation procedure or views, you are welcome to join the club and suggest new ideas to move on! Noé 14:31, 18 June 2020 (UTC)[reply]

I'd rather they pick some umbrella name like “WikiWorld”. There is a domain wikiworld.com, but it seems to be of an inactive project. --Lambiam 19:04, 18 June 2020 (UTC)[reply]

I'd go for the name Wikipedia + Friends, which can be abbrev'd to WF. --Nueva normalidad (talk) 13:59, 19 June 2020 (UTC)[reply]

Tuvan transliteration

Why is it that Tuvan is currently the only Cyrillic-based Turkic language whose transliteration does not conform to the Common Turkic Alphabet? Seems kind of arbitrary. Most of the edits that caused this seem to have been made by an IP user in 2016, being his only changes, whereafter he did not use the account any further.

In addition to looking foreign and standing out, the transliteration itself has some problems:

", as in aʺt (/àt/), does not accurately represent the change in pitch caused by it after a vowel (would be spelled àt according to CTA, which looks much more familiar).
ä, as in män (/me̞n/), misleadingly looks like it should represent phoneme /æ/ or /ɛ/ (this was probably mistakenly taken from Kazakh, which actually does have /æ/ as ä).
ï does not need a diacritic to represent /i/ (this was also most likely taken from Kazakh, where ï and i are two distinct letters, whereas Tuvan does not have such distinction).
As far as I know, there are no Turkic languages that consistently transliterate ң as ŋ, being instead represented as ñ (see entry for ң).
ž, š and č do not conform to Turkic transliteration norms (would be j, ş and ç respectively), instead corresponding to Slavic ones (not a problem, but still heterodox).

See Kazakh, Kyrgyz and Bashkir transliterations for more. I'd advise this to be changed, if only for uniformity's sake. —Alves9 (talk) 16:19, 18 June 2020 (UTC)[reply]

Is this basically a matter of undoing the IP's edits ([3], [4])? This seems something you can do yourself. Or is it more complicated? --Lambiam 18:48, 18 June 2020 (UTC)[reply]

The edit in question is ([5]). But yes, it's indeed as simple as undoing his edits and changing a few letters which were already transcribed wrongly. I just thought I should ask for permission first, and also list the reasons for the change. I believe it's policy to do so. —Alves9 (talk) 21:39, 18 June 2020 (UTC)[reply]

You can only undo it (using the undo function) if you also undo the IP’s next edit at the same time. The Wiktionary:Tuvan transliteration page should be made consistent with the transliteration module. --Lambiam 09:33, 19 June 2020 (UTC)[reply]

The Korean IP in question has been blocked several times for, among other things, strange edits to transliteration pages and modules for "Altaic" languages. Chuck Entz (talk) 06:58, 19 June 2020 (UTC)[reply]

Norwegian Riksmål as a subvariety of Norwegian

Should Norwegian Riksmål be added in the subvarieties module as a subvariety of Norwegian for a category with terms that are only used in Norwegian Riksmål (e.g., almen)? See also Talk:almen where a subcategory was discussed. J3133 (talk) 19:24, 18 June 2020 (UTC)[reply]

Capitalization of articles in Spanish proper names

The name of the Peruvian region la Libertad is spelled with a lower case la on the Peruvian government's apparently official list of regions[6] but with an upper case La elsewhere. Is there a Wiktionary standard spelling of Spanish proper names beginning with an article when either spelling might be found in the wild? For example, after much debate Wikipedia chose the Beatles over The Beatles. Vox Sciurorum (talk) 22:36, 18 June 2020 (UTC)[reply]

Including alternative reconstructions

I hold that alternative reconstructions, regardless of how wrong they are, should be kept on entries for both posterity and reference to theories therein. Any thoughts on that? --{{victar|talk}} 21:06, 19 June 2020 (UTC)[reply]

Aren't they already? See for example Reconstruction:Proto-Sino-Tibetan/b-ləj. In the first line, an alternative reconstruction is given, namely *blyid (Coblin, 1986). Or do you mean something else? Thadh (talk) 21:46, 19 June 2020 (UTC)[reply]

I'm guessing you didn't see my link above. --{{victar|talk}} 22:15, 19 June 2020 (UTC)[reply]

I just didn't understand the difference between the two, so I don't understand why the reconstruction was deleted. Thadh (talk) 07:38, 20 June 2020 (UTC)[reply]

Why don’t we include them, marking them as deprecated etc. where relevant? Hölderlin2019 (talk) 09:48, 20 June 2020 (UTC)[reply]

The reconstruction that was removed isn't deprecated or wrong, it's just unnecessary. Any verb form of that particular tense can optionally take the augment *h₁e-, so it just isn't necessary to list it. It isn't even really an alternative reconstruction, it's a reconstructed alternative form. —Mahāgaja · talk 12:30, 20 June 2020 (UTC)[reply]

I know, I’m responding to Victar’s comments in relation to a hypothetical explicitly wrong or obsolete reconstruction, which seems the meat of what he wants to get at. Where alternate forms canonically exist for a word, I tend to support including them, and don’t change my stance for reconstructions. Hölderlin2019 (talk) 17:17, 20 June 2020 (UTC)[reply]

I don't suppose a reconstruction can definitively (even in a hypothetical) be called wrong. "Obsolete" reconstructions don't differ from alternative reconstructions. Thadh (talk) 18:17, 20 June 2020 (UTC)[reply]

A reconstruction is basically a reference to a correspondence, so a reconstruction based on a real correspondence might be correct for all practical purposes even though the form of the reconstruction is based on wrong interpretation of the correspondence. For example, the older literature is full of *a and *o that the modern understanding considers *e + a laryngeal. This is more a difference in notation than an error. A reconstruction based on an invalid correspondence, however, is definitely wrong. For instance reconstruction of a Proto-Germanic term based on common borrowings where the alleged reflexes don't reflect sound changes such as the High-German Consonant shift would be wrong.

Either way, I think we should look at the way taxonomists treat synonymy. Taxonomic names are based on the rule of precedence: a taxonomist publishes a description of the taxon, preferably referring to one or more specimens known as types. If anyone later publishes a description that turns out to belong to the same taxon, that name is known as a synonym for the original name. Because these descriptions are spread out in a scattered body of publications, the literature is full of these synonyms. In many cases, the original publication is so obscure that a synonym may be accepted as correct for centuries and be used in all the important reference works of the period. Or a taxonomist may wrongly think the specimens they're working with match the description for a particular name. Then there are all the cases where a species is described in one genus and later taxonomists decide that it belongs to another one, and the original name is modified to reflect that.

The way that taxonomists deal with this is including a list of synonyms in reference works. The names in a synonymy section are all presumed to be incorrect, but they're listed so readers know how to interpret references to those synonyms. They may be annotated to explain why they're wrong, though the simple fact they were published later than the original name is often explanation enough.

I think it would be a good idea to include incorrect or obsolete reconstructions that appear in the literature- as the equivalent of misspellings and nonstandard/proscribed forms in mainspace. The main difference is that reconstructions are etymological constructs rather than part of the language, so the standard is of accuracy and consensus of the latest authoritative sources rather than usage. In other words, it's perfectly okay to be prescriptive rather than descriptive in how we label them. Chuck Entz (talk) 19:41, 20 June 2020 (UTC)[reply]

@Mahagaja: You know, my mistake with that example -- I absolutely agree with it being removed. Here, this is a better example.

(For those wondering, my first example is not an alternative reconstruction, but with an identical reconstruction just with the aorist prefix that Greek and Indo-Iranian share added to it. I sillily failed to noticed that when pulling up an example.)

--{{victar|talk}} 19:37, 20 June 2020 (UTC)[reply]

@Victar: aortic? Hölderlin2019 (talk) 05:03, 22 June 2020 (UTC)[reply]

The forms given there were listed as alternative forms, meaning that we posit these synchronically all existing side by side, just like alternative forms today exist side by side. If that were the case, then these alternative forms must, just like any other reconstruction, have descendants. And these don't appear to have any, so I removed them just like I would any other reconstruction lacking descendants. —Rua (mew) 11:20, 23 June 2020 (UTC)[reply]

Rua, you frequently reconstruct whole paradigms automatically regardless of attestation. 109.41.0.27 19:18, 8 July 2020 (UTC)[reply]

Amendments to Appendix:English pronunciation

Would someone familiar with English phonology please review the recent edits made to Appendix:English pronunciation? I'm not sufficiently qualified. However, I reverted some recent attempts to alter /ɹ/ to /r/. — SGconlaw (talk) 11:07, 20 June 2020 (UTC)[reply]

The problem with that page is it has too many cooks. Everyone has their own idea about the best way to represent English in IPA, and everyone wants the page to reflect their own views. As a result, the page winds up a chaotic mess. I could go in there and make it internally consistent according to what I think is the best way to represent English, but in a few hours or days or weeks someone else will come along and add their own preferences. —Mahāgaja · talk 12:25, 20 June 2020 (UTC)[reply]

Should the page be made editable by template editors and administrators only? — SGconlaw (talk) 13:13, 20 June 2020 (UTC)[reply]

I don't think so. The edits aren't vandalism, they're just a different interpretation. And anyway, even among template editors and administrators there are probably more than enough differences of opinion. —Mahāgaja · talk 16:17, 20 June 2020 (UTC)[reply]

OK, then. Well, maybe help to keep an eye on the page from time to time. — SGconlaw (talk) 16:24, 20 June 2020 (UTC)[reply]

Switching to Latin letters in the transliteration of Sogdian, Parthian and others

See Talk:βγpwr (@Vahagn Petrosyan) where "switching to Latin gamma and Latin delta in the transliteration of Sogdian, Parthian and others because mixing scripts is never a good thing" was discussed in February 2014; it was said that there is no Latin beta. Latin beta and other letters (Latin chi, Latin omega) were added to Unicode in June 2015 and are now in default fonts. J3133 (talk) 17:21, 20 June 2020 (UTC)[reply]

Why do we have Latin-alphabet entries for all these Old Iranian languages in the first place? Don't the native alphabets exist in Unicode? The Latin-alphabet entries should just be {{romanization of}} and the meat of the entry at the form in the native alphabet. —Mahāgaja · talk 19:33, 20 June 2020 (UTC)[reply]

@J3133, Mahagaja: Yeah, those entries were created donkeys years ago. I'll move them over to nativescript entry. --{{victar|talk}} 20:56, 20 June 2020 (UTC)[reply]

@Victar: The transliteration (between the brackets) still uses a mixture of Greek and Latin letters (the talk page mentioned transliteration). J3133 (talk) 21:40, 20 June 2020 (UTC)[reply]

Yep, lots of transliteration schemas and that's the one we're currently using. --{{victar|talk}} 21:42, 20 June 2020 (UTC)[reply]

Inconsistency with WOTD and FWOTD templates

The FWOTD template on future FWOTDs is hidden but the WOTD template on future WOTDs is shown. The template on future FWOTDs was hidden on 26 January 2015 (revision). It should be decided whether to show both or hide both for consistency. J3133 (talk) 04:19, 21 June 2020 (UTC)[reply]

My preference is for the date to be shown. It makes it possible for editors to immediately see if an entry has been set as a future WOTD/FWOTD, and so shouldn’t be renominated. — SGconlaw (talk) 04:32, 25 June 2020 (UTC)[reply]

What's our current canonical reference on what does and does not belong in the "See also" section?

Since the early 2000s the See also section was for looser semantic relationships than synonyms; derived terms, and related words.

For instance about a decade ago a new contributor wanted to disconnect airship, balloon, barrage balloon, blimp, dirigible, hot air balloon, and Zeppelin because they're different, utterly failing to miss the point that the section is not for synonyms.

Just now a relatively new contributor is disconnecting See also links between 等於／等于 (děngyú) "to equal, to be tantamount to" and 平等 (píngděng) "equal, equality". And 面具 (miànjù) "mask" from 口罩 (kǒuzhào) "mask (surgical etc.)".

These are the sorts of word we've been linking for many years. Foreign words that translate to the same or similar words in English, words which are easily confused, words which small dictionaries don't clearly distinguish, and many other kinds of words that from the English speaking perspective belong to related concepts.

These are the ones I notice because I recently linked them. I don't know if this is an ongoing or widespread reinterpretation of the "See also" section.

But I'm not sure if we ever wrote this up. If we did, what do we say about it? Are we keeping as it's been for a long time or have we recently changed it? — hippietrail (talk) 07:51, 21 June 2020 (UTC)[reply]

If the contents had a specific fixed meaning (like synonyms, hyponyms, coordinate terms) we would usually put them under those headings instead. "See also" can be anything that is somehow relevant but doesn't fit other headings. I occasionally use it to link to Wikipedia topics that might be confused with the normal term, e.g. at word art. Equinox ◑ 17:32, 22 June 2020 (UTC)[reply]

There has been no formal change. I think WT:ELE is suitably unclear about what does belong there. It is a residual, garbage-can heading. DCDuring (talk) 20:30, 22 June 2020 (UTC)[reply]

Spellings for Miyako and other generally unwritten Japonic languages

@Kwékwlos, MiguelX413 I feel like spelling things the way Japanese does it (with kana and kanji) poses multiple problems for the Ryukyuan languages.

All but two modern Japonic languages (Japanese and Okinawan) are generally unwritten. Even Okinawan itself lacks a consistent system, due to the language getting phased out of public, official use in the 17th century before a standardized form could come into use. Thus writing kanji + kana with the unwritten languages is misrepresentative. Most academic texts in the Western world romanize them phonemically instead.
Several kanji pages are covered in memory-related module errors, making them unusable for making more Ryukyuan language entries on them.
The phonotactics of some Ryukyuan languages (especially Miyako) are very poorly suited for kana, with syllabic consonants everywhere.

mellohi! (僕の乖離) 21:50, 21 June 2020 (UTC)[reply]

Glottal stops in Latin script Old Khmer

How should we be handling glottal stops in the onsets of Old Khmer syllables?--RichardW57 (talk) 03:25, 22 June 2020 (UTC)[reply]

There are perhaps five categories to consider:

Word initial onset consisting of just the glottal stop, and written with a vowel letter and no dependent vowel. (I'm counting អា as a vowel letter.)
Word initial onset consisting of អ, followed by a vowel letter. This category may be empty, depending on how late Old Khmer extends.
Cluster of two consonants, with glottal stop as the first element. I have seen this written with ^a.
Cluster of two consonants, with glottal stop as the second element. I have seen the glottal stop written as a hyphen, e.g in ph-em for ផ្ឯម៑ (“'sweet'”). A subsidiary question is how we distinguish that spelling from ~~ប្អេហ៑~~ប្អេហ. Or don't we?
Fixed typo - no viriam according to SEAlang; the pasted image in the footnote in the paper (BEFEO LIX IMA 14) looked vertically overtrimmed. SEAlang spells it "pʼeha".--RichardW57 (talk) 10:47, 22 June 2020 (UTC)[reply]
Single intervocalic glottal stop. Some apparently write that without any character; possibly they would use a hyphen to distinguish it from a diphthong.

Or perhaps we just document all the major Roman script spellings.--RichardW57 (talk) 03:25, 22 June 2020 (UTC)[reply]

What sort of apostrophe should we use? I thought Wiktionary had a strong preference for the ASCII apostrophe. A lot of red-links in Wiktionary use U+02BC modifier letter apostrophe, but of course papers are quite likely to use U+2019 RIGHT SINGLE QUOTATION MARK, which one gets from 'smart quotes'. Unicode started out recommending U+02BC for English contractions, but soon switched to U+2019. Or do we have to allow both? They're both out there.--RichardW57 (talk) 03:25, 22 June 2020 (UTC)[reply]

We should use U+02BC whenever an apostrophe-like thing is used as a letter rather than as a punctuation mark. When it stands for a sound like /ʔ/, as it does here, it's a letter. Granted, other websites might not use the various characters correctly, but we still should. —Mahāgaja · talk 05:13, 22 June 2020 (UTC)[reply]

That's a good point. I will change to use U+02BC in translit modules also. --Octahedron80 (talk) 06:40, 22 June 2020 (UTC)[reply]

Just to clarify as a general point, what do we do about English "'o'er" for "hotter"? The second apostrophe represents a glottal stop in most speech. (There are variants where even the glottal stop is dropped.) --RichardW57 (talk) 10:28, 22 June 2020 (UTC)[reply]

I'd say that 'o'er (“hotter”) uses punctuation apostrophes to indicate deleted letters both at the beginning and in the middle of the word. It's sort of coincidence that the second incidence corresponds to a glottal stop in pronunciation. I don't think English speakers perceive the second apostrophe there as a letter standing for /ʔ/, though such use may well be the reason why the curly apostrophe came to be used to symbolize a glottal stop in transliterations of other languages. —Mahāgaja · talk 10:42, 22 June 2020 (UTC)[reply]

Incidentally, if it's decided to use a superscript "a" in the transliteration of Old Khmer, we should use ᵃ U+1D43 modifier letter small a, not <sup>a</sup> or {{sup|a}}. —Mahāgaja · talk 11:24, 22 June 2020 (UTC)[reply]

I'm not proposing it for transliteration from the Indic script, merely for Old Khmer in the Latin script. Thus, it would normally appear as an alternative form. --RichardW57 (talk) 12:00, 22 June 2020 (UTC)[reply]

I appreciate we'd have to use U+1D43 in entries. Unfortunately, it comes out horribly small in my default font, so I used the mark-up for legibility. --RichardW57 (talk) 12:00, 22 June 2020 (UTC)[reply]

Having settled on U+02BC for the apostrophe, there are a lot of transliterations out there that use the normal Indic rules. Unlike modern Khmer, Old Khmer seems to have predominantly used vowel letters rather than a glottal stop consonant, so what are we to do about words where we can take the syllable initial glottal stop as implicit? We don't seem to be recording implicit final glottal stops - or are they a more recent feature of Khmer? --RichardW57 (talk) 12:00, 22 June 2020 (UTC)[reply]

There seems to be a convention, used by Jenner and also the Library of congress system, that the glottal stop is written in translation (the latter by 'q') if the glottal stop letter is written, but no glottal stop is written if an independent vowel is used. What is very confusing is that the vowel letters 'a' and 'ā' are treated as containing the glottal stop letter. Still, this is a usable convention. It would be good to check if some transliterations look very different simply because they treat the effective vowel letters អ and អា as vowel letters. (I am completely ignoring the deprecated Unicode characters U+17A3 and U+17A4.) --RichardW57 (talk) 09:15, 24 June 2020 (UTC)[reply]

I've also come across the interpunct use to indicate viriam. What are we to do about that? Perhaps it depends on the style of the transcription. "p-eh" v. "pʼeha" for the same occurrence bothers me, and may be why some at least feel they should make the use of viriam explicit. Perhaps we can use the convention that aksharas implicitly have 'a', regardless of any old Khmer reading rules, and treat Latin script exceptions as historical anomalies. --RichardW57 (talk) 12:00, 22 June 2020 (UTC)[reply]

Dating excerpts and anthologies

I'm obtaining most of my quotation material from books, journal papers and the like that explicitly quote much older material. However, when I follow the interfaces for {{quote-book}} and {{quote-journal}}, the only date I can enter is the date of the book or journal. Just as we separated the language of the work as a whole and the language of the excepts, can we separate these dates? There may also be the issue of material whose spelling has deliberately changed since it was first composed; at the very least, there may even be a change of script. --RichardW57 (talk) 11:06, 22 June 2020 (UTC)[reply]

@RichardW57: you can use |date= or |year= for the original date, and |year_published= for the date of the book or journal that you are referring to. Wherever possible, try to locate the original work and quote from that instead, as you may find the passage has been altered in the secondary work. — SGconlaw (talk) 11:33, 22 June 2020 (UTC)[reply]

Even if the original is not durably archived? --RichardW57 (talk) 16:34, 22 June 2020 (UTC)[reply]

Can you give an example? (Which part of my comment is your question directed at?) — SGconlaw (talk) 17:37, 22 June 2020 (UTC)[reply]

Images of manuscripts. Printing a book can make part of the manuscript durably archived. (I am not sure of the status of vanity publications.) The manuscript itself is not durably archived, so is not valid for quotations unless an LDL exemption allows it. I am not so sure about the durability of some self-styled *archives* of images of manuscripts. In some cases, such a secondary publication may be the nearest a manuscript has to a durable archive. --RichardW57 (talk) 18:18, 22 June 2020 (UTC)[reply]

My comment was prompted by the prospect of having to visit Khmer inscriptions to validate their descriptions' published contents. Not all inscriptions photograph easily, and retouching is not unknown in biology. --RichardW57 (talk) 18:18, 22 June 2020 (UTC)[reply]

Right. If there is no reliable database of inscriptions, then quote a secondary work using the method I mentioned earlier. My comment about trying to quote from the original work if possible was in reference to early published works (that is, if you can locate the original edition at Google Books or the Internet Archive, for example, cite to that in preference to a later reprint). — SGconlaw (talk) 22:44, 22 June 2020 (UTC)[reply]

Why wouldn’t a manuscript be durably archived? After all it’s located in an archive – it’s its job to preserve it and hold it accessible. The hypothetical possibility of archival material getting lost/misplaced does not neutralize the character of durability a document has. Also current and upcoming digital libraries of digitized manuscripts are an extra layer of security. That’s durability understood as an institutional guarantee. Fay Freak (talk) 18:47, 23 June 2020 (UTC)[reply]

Don't we (almost?) always get access to the content of an underlying manuscript through a printed work? Do we ever go to the original manuscript to authenticate the transcription. Don't such printed works constitute the most well-distributed means of accessing the lexical content of the underlying manuscript? We may on occasion use Commons images, which I think we consider to be durably archived. DCDuring (talk) 19:28, 23 June 2020 (UTC)[reply]

I've been citing pictures of manuscripts (and occasionally of printed work) in publications, but transcription raises another level of issues. I thought it would be a fairly soft job to use the quotes from Jenner's Old Khmer dictionary to improve the quality of our entries, but different authors use different schemes, and it's not obvious to me that the transcriptions from Old Khmer to Roman script are reversible. For later Khmer, the schemes I've seen definitely become irreversible. Unless I'm being thick, getting legible pictures of the inscriptions is expensive or difficult. --RichardW57 (talk) 20:41, 23 June 2020 (UTC)[reply]

@Fay Freak: Not all manuscripts are in archives (but does that protect them from fire and flood?), and electronic archives depend on how copies are kept - backups aren't always maintained. And I'm not sure the risk of no longer supported formats has yet gone away. The loss of material in Yahoo groups prior to their effective closure comes to mind. Were Yahoo worse than all these archives? --RichardW57 (talk) 20:41, 23 June 2020 (UTC)[reply]

Backups aren’t maintained? If one has such an expensive project like digitizing old material one makes damn sure to keep some positions for backup media, after all the effort. Cloud providers can do it too after all and the public administration apparatus is more voracious in money use after all. And recent archive buildings are bomb-proof. The meme one of Cologne is an exception and even in that one one recovered most after it got buried in water. Nobody defined “durable” as “having been in circulation”. A deed stored in the provincial library in the sticks is quotable though verifying it may require visiting that same archive; this is confirmed by the usual practice in historical sciences (as there is no reason why Wiktionary editors may not use as many sources if they are making a secondary source). Apparently one has not cared about accessibility when laying down the criteria of inclusion, otherwise one would write “accessibility” and not “durability”, though one might argue that more or less nobody understood the CFI here. They added ill-defined formal criteria to somehow encompass by them the actually desired material limits for content they’d like to omit, that is for the bad boys, and not against the things you would like to do, @RichardW57, else one only reads things into the rules about which there are no regulations. They were about artificial content, not someone trying to get to what is actually there. If it’s on me, one could save all these argy-bargies and replace them with a flexible “a word is includable if it can be found in regular use” – it is bad to deny things in front of the eyes. Fay Freak (talk) 01:48, 24 June 2020 (UTC)[reply]

You're assuming that archives are well-funded and well-run. Back up systems can fail, just as 'uninterruptible power supplies' are known to fail. One cause of failure is rapid increase in the size of the content. Backup equipment can become oboslete. --RichardW57 (talk) 09:54, 24 June 2020 (UTC)[reply]

You are probably correct about the formal CFI being deficient. And the timescales for an RfV challenge are too short for some material. --RichardW57 (talk) 09:54, 24 June 2020 (UTC)[reply]

Using "#*" or "#:" with requests for quotation

Should "#*" or "#:" be used with requests for quotation (using Template:rfquotek). I looked at the documentation but did not find anything about this. Searching for uses of the template with ~~"#*"~~ "#:" shows ~~8,882~~ 9,837 results and "#*" shows ~~225~~ 433 results (numbers changed after allowing no spaces after "#*" and "#:" and multiple "#" in the search). J3133 (talk) 05:20, 23 June 2020 (UTC)[reply]

@J3133: I don’t think it really matters one way or another, but if one prefers to be consistent, I think it would make sense to use “#*” since that is the markup for quotations. — SGconlaw (talk) 05:35, 23 June 2020 (UTC)[reply]

@Sgconlaw See changed text; I wrote it wrong. J3133 (talk) 05:45, 23 June 2020 (UTC)[reply]

Using "#*" isn't a good idea because then it will automatically be collapsed just like quotes are, so users will expect to see a quote when they click "Expand" only to be disappointed by seeing merely a request for quotation. I'd use "#:" for that reason. —Mahāgaja · talk 05:53, 23 June 2020 (UTC)[reply]

Good point. I guess I haven’t noticed because I have quotations expanded by default. — SGconlaw (talk) 08:52, 23 June 2020 (UTC)[reply]

It should be #* because that creates a nested unordered list which is what you want. Conversely, #: creates a definition list and quotations are not definitions. —Justin (koavf)❤T☮C☺M☯ 07:37, 23 June 2020 (UTC)[reply]

@Koavf: Usage examples aren't definitions either, and they use "#:". —Mahāgaja · talk 09:30, 23 June 2020 (UTC)[reply]

@Mahagaja: Those are also incorrect: dls shouldn't be used to mark up dialogue, nor should they be used to change the indentation of text on a page. If we have a name–value pair ("thing:definition", "role:person", "phonenumber:867309", "uri:https://example.example/", etc.), then a definition list is appropriate. If we are giving a bunch of explanatory notes or instances of something ("There are a lot of desserts: cake, pie, sorbet...", etc.), then it's not. And to be clear, those name:values can have more than one value, as per in the instructions. —Justin (koavf)❤T☮C☺M☯ 09:46, 23 June 2020 (UTC)[reply]

@Koavf: I'm not savvy enough to fully understand your answer, but in fact we do use "#:" on usage examples, whether we should or not. If doing so is poor markup, is there an easily available alternative we could realistically consider using instead? —Mahāgaja · talk 10:13, 23 June 2020 (UTC)[reply]

@Mahagaja: Hey, sorry if I was too terse--I'm happy to discuss at length and I am very passionate about proper semantics and accessibility but not everyone is so I try to not do deep dives on general forums. I guess the question is what are you trying to accomplish? Do you just want to have a usage example floating underneath the definition? If so, then "#*" and "#**" and "#***" etc. will do that just fine. I'd recommend taking a quick look at mw:Help:Wikitext_examples#Organizing_your_writing for some inspiration. Again, happy to discuss at length (tho I need to sleep now). —Justin (koavf)❤T☮C☺M☯ 10:32, 23 June 2020 (UTC)[reply]

@Koavf: It isn't just me. Our official policy on Example sentences says they should "be indented using the “#:” command placed at the start of the line", so if that's poor semantics and accessibility we have to a vote to change the policy, but of course we would have to have a concrete suggestion as to what we're going to change it to. At the moment, "#*" is for quotations, not example sentences, and it triggers an automatic collapse (for anyone who doesn't have that turned off, I guess; I don't know how to turn it off). —Mahāgaja · talk 10:45, 23 June 2020 (UTC)[reply]

@Mahagaja: Oh sure. To be clear, I did try to not boil the ocean by bringing up the much larger issue of incorrect semantics across the entire dictionary project. Right now, we're talking about a relatively manageable problem of 10,000 instances of a template rather than a huge structural issue. —Justin (koavf)❤T☮C☺M☯ 10:49, 23 June 2020 (UTC)[reply]

#: was probably chosen for usage examples to distinguish them from quotes (#*), to implement the quote hiding logic. Making this more semantically correct is difficult, you would need to create one nested list for examples, and one for quotes. Or just one list, and add attach extra semantic meaning via templates. Javascript can then be used to show/hide items based on class names, not element names. – Jberkel 12:32, 23 June 2020 (UTC)[reply]

Usage examples do not suffer the waste of space consequent to our current standard of extremely verbose metainformation in book citations. Usage examples are supposed to help users see how the word is used in context. Arguably, they are of more immediate importance to users than attestation. Thus, it is arguable that the usage examples should be more visible to users than attestation. That argues for #: for usage examples. Mahagaja's point about users being disappointed to find no attestation when #* is used is a good one, but I'd guess that most users don't look at the attestation. Those who do might be just the ones to respond to a request for citations. DCDuring (talk) 14:38, 23 June 2020 (UTC)[reply]

My point wasn't purely hypothetical. It has happened to me personally that I've clicked "Expand" to see the quotations only to see nothing but a template telling me there aren't any, and I've gotten very annoyed at that. It makes me feel tricked. —Mahāgaja · talk 15:27, 23 June 2020 (UTC)[reply]

I didn't think it was hypothetical, but I think we might have to tolerate that annoyance on the hypothesis that we actually have a large number of users other than us who don't care so much about attestation and, so, rarely click on the quotes show-hide control. Of course, it would be nice to have some support for that hypothesis, but it apparently is not part of wiki-culture to get data on "who" are users are other than ourselves.

OTOH, I have nothing against any conspicuous display of requests. They serve to demonstrate the fact that we remain a work in progress. Such displays may encourage new contributors and also serve as a caution to passive users. DCDuring (talk) 15:43, 23 June 2020 (UTC)[reply]

Exactly, as long as we have people who continue to add non-templated usage examples and quotations this is an unfixable problem. DTLHS (talk) 17:00, 23 June 2020 (UTC)[reply]

I am quite dismayed to learn that #: does not behave as expected or advertised. Given documentation I've read both here and on Wikipedia in years past, the colon at the start of a line in wikicode is an indentation marker, used to indent either before a bullet or number, as in :# or :*, or after a bullet or number, as in #: or *:. See also w:Help:Talk_pages#Indentation, where the official documentation appears to recommend this:

E.g., if you are replying to something in a complicated discussion that starts with #:::*, just copy-paste that and add a :, resulting in #:::*: in front of your reply (or use #:::** if you feel it is necessary for your reply to begin with a bullet point).

Official wikitext documentation is recommending that we use the colon for indentation. The visible on-screen rendering reinforces this idea. And there is no mention of this producing definition-list markup.

I suspect the bigger issue is not that we at Wiktionary should not be using #:, but rather that the MediaWiki engine itself should not be producing <dl><dd>...</dd></dl> for #:. This output is particularly confusing considering that there is a separate wikitext recommendation specifically for description or definition lists, described at both w:Help:Wikitext#Description_lists and w:Wikipedia:Manual_of_Style/Lists#Description_(definition,_association)_lists:

; Term
: Definition1
: Definition2
: Definition3
: Definition4

@Justin, if this misuse of <dl><dd>...</dd></dl> presents semantic processing and usability issues, I think perhaps this may be worth filing as a bug report. Curious if others think likewise. ‑‑ Eiríkr Útlendi │^{Tala við mig} 23:01, 23 June 2020 (UTC)[reply]

@Eirikr: The good news is that since the content here is so machine-readable, it's actually not that high of overhead to implement. I've had this as a back-burner idea for awhile but only brought it up as it's pretty germane to this narrow request about the best practices for this template. I'm happy to organize a vote around this but I'll need to wrap my head around it to ensure that I'm not doing something stupid. I think this can all be fixed on-wiki and won't require Gerrit patches and bug reports externally. Thanks for giving your feedback. — This unsigned comment was added by Koavf (talk • contribs) at 23:19, 23 June 2020 (UTC).[reply]

FWIW, I noticed that Wikipedia similarly produces <dl><dd>...</dd></dl> for #:. So any fix here at Wiktionary will presumably still leave semantic and usability issues for the other Wiki sites. Due to how we use our wikitext, that might not matter so much for the others. ‑‑ Eiríkr Útlendi │^{Tala við mig} 23:26, 23 June 2020 (UTC)[reply]

Regarding the use of : fo indentation, mw:Help:Formatting remarks in a few places that "This workaround may harm accessibility" and "The usage of #: and *: for breaking a line within an item may also harm accessibility". Presumably because screen readers interpret the definition lists semantically, and not as indents. – Jberkel 06:45, 27 June 2020 (UTC)[reply]

Proto-Turkic and Common Turkic

I wanted to broach an issue with Proto-Turkic. Right now Proto-Turkic (including Bulgar) and Common Turkic (excluding Bulgar) are one in the same -- same headers, same language code. This annoying for a few reasons.

Their reconstructions can differ, so we can find ourselves reconstructing Common Turkic and just writing what the Proto-Turkic form would be in the etymology or under alternative forms.
Late borrowings into Common Turkic are less true to the facts when you say they were borrowed into Proto-Turkic.
Having a single Bulgur entry next to a Common Turkic entry with everything thrown under that makes columns impossible and just generally very ugly looking. The alternative is to ignore Common Turkic all together.

So like we did with Proto-Germanic and Proto-West Germanic, I petition that we split Proto-Turkic and Common Turkic. Personally, I hate the name Common Turkic. It's out of step with academia who dumped the descriptive "common" for the prefix "pro-" decades ago. If it was up to me, I would have Proto-Bulgaro-Turkic and Proto-Turkic, respectively, but I'm curious what others thoughts are. @Crom daba, Allahverdi Verdizade --{{victar|talk}} 16:05, 23 June 2020 (UTC)[reply]

a) I support splitting Common Turkic and Proto-Turkic. b) I decisively oppose the idea of replacing the widely/exclusively accepted nomenclature featuring Proto-Turkic and Common Turkic respectively by anything else, and by doing so creating confusion and decreasing the usability of Wiktionary.

c) I further suggest treating all Turkic terms which lack evidence for being reconstructed on the Proto-Turkic level as Common Turkic. In practice, it means that items which cannot be found in Chuvash (or as an external loan from an Bulgar language into for instance Hungarian), or which in some other way can be traced to Proto-Turkic (e.g. lambdaism or rhotacism in Yakut) be converted to Common Turkic. Allahverdi Verdizade (talk) 16:42, 23 June 2020 (UTC)[reply]

That ugly “Having a single Bulgur entry next to a Common Turkic entry with everything thrown under that makes columns impossible” I have seen now the first time and this order is unnecessary and generally successfully avoided hereunto. I am not against splitting Common Turkic and Proto-Turkic completely. In the end it is only having in the titles and category names what one actually means. I had not seen the issue with the nomenclature until you mentioned it; yeah “Common Turkic” could imply “also Bulgar and Chuvash” like “common to all Turkic”. I don’t know about alternative names. “Proto-Bulgaro-Turkic” can’t be it as Bulghar is also Turkic. Fay Freak (talk) 18:39, 23 June 2020 (UTC)[reply]

Splitting Proto-Turkic and Common Turkic would be fine. "Bulgharo-Turkic" is not completely unprecedented but it would be old-fashioned. I support using Chuvash, Hungarian, Mongolian when it exhibits archaicisms and internal evidence (rhotacism) for separating Proto-Turkic and Common Turkic. Crom daba (talk) 23:16, 23 June 2020 (UTC)[reply]

Yeah, but “Bulgaro-Turkic” is another name of the Oghur Turkic branch, or perhaps only Bulghar, judging from the few examples I can find digitally. About the names of what we call Proto-Turkic and Common Turkic one is disingenuous so far. Fay Freak (talk) 01:07, 24 June 2020 (UTC)[reply]

I've seen Bulgaro-Turkic referred to as both, but if Proto-Turkic and Proto-Common Turkic is the consensus, sobeit. --{{victar|talk}} 15:41, 24 June 2020 (UTC)[reply]

I oppose this for the same reason that Proto-Baltic was merged into Proto-Balto-Slavic, and Proto-Finno-Ugric was merged into Proto-Uralic. —Rua (mew) 18:15, 24 June 2020 (UTC)[reply]

So you're saying that Common Turkic is itself an invalid taxon? mellohi! (僕の乖離) 00:20, 25 June 2020 (UTC)[reply]

No. —Rua (mew) 12:33, 27 June 2020 (UTC)[reply]

Mobile main page special casing will be disabled

My attention has been drawn to phab:T254287, but I am not tech-savvy enough to know what to do about it. Some of our administrators who understand this sort of thing, please take a look! —Mahāgaja · talk 16:47, 23 June 2020 (UTC)[reply]

There's a short thread in the Grease Pit. Vox Sciurorum (talk) 22:42, 23 June 2020 (UTC)[reply]

purity

If you're missing the 1990s Internet may I recommend User:Equinox/Purity test. I scored 40. Equinox ◑ 02:00, 24 June 2020 (UTC)[reply]

Almost a perfect score. 47. I'm just missing out on a Meronym. --Nueva normalidad (talk) 15:43, 25 June 2020 (UTC)[reply]

And the score becomes 48. I am pure as driven snow, bitches.

Entries with obsolete typography

Apparently there was never clear consensus on whether entries with obsolete typography ("u" instead of "v" and vice versa and "i" instead of "j") should be added, resulting in entries being sent to RFD separately at different times (links to discussion pages:

vp (English, RFD, 2011, kept for no consensus or weak consensus to keep),
vpon (English, RFV, 2012, passed),
dies Iouis (Latin, RFD, 2014, kept),
euery (English, RFD, 2014, kept),
uacuus (Latin, RFD, 2016, no consensus for deletion),
auec (Middle French, RFD/RFV, 2016, no clear consensus to delete),
giuen (English, RFD, 2016, kept),
deiuos (Old Latin, RFV, 2020, deleted)).

Also related is

cõtempt (English, RFD, 2015, no consensus to delete).

I did not find anything about this in Wiktionary:English entry guidelines#Orthography. Multiple users have written different opinions about whether the entries should be added. Instead of sending individual entries to RFD there should be consensus on whether they should be added. J3133 (talk) 18:04, 24 June 2020 (UTC)[reply]

Using the multiple images template

Should Template:multiple images be preferred to no template for images in entries? Compare nursery, pad, queen (images used with the multiple images template) and black, corona, white (no template). J3133 (talk) 20:14, 24 June 2020 (UTC)[reply]

Also related is how the used senses in images should be written ("[(number)]" or "(sense (number))") and whether the entry term should be in bold. J3133 (talk) 20:25, 24 June 2020 (UTC)[reply]

The main difference between using {{multiple images}} and not using it is that in the latter case there is more white space between images, which means they will extend slightly further down the page. Using the template results in a more compact layout.

My preference is for using the format "(sense 1)" because it is clearer. A number in brackets may be mistaken for a footnote number or an external link. On occasion it will be necessary to qualify further, for example, "(etymology 1, sense 1)" or "(noun sense 1)", so just having the sense number in brackets with no words may be insufficient. I see no particular purpose in making the entry term bold in an image caption. — SGconlaw (talk) 20:36, 24 June 2020 (UTC)[reply]

If that template is finally fixed to use relative sizes instead of pixel dimensions you can use it as much as you want for all I care. So far no good template for images exists though Wiktionary likes to templatize about everything; there wouldn’t be much advantage from it though. Fay Freak (talk) 21:09, 24 June 2020 (UTC)[reply]

What constitutes usage for Latin script Old Khmer?

Old Khmer has been declared a dual script language, and almost all the citations of it I've seen in etymologies on Wiktionary use the Latin script.

The issue arises because I'm seeing different conventions in use in the publications of inscriptions. The actual Khmer (or whatever) text is not always shown; a PDF floating around the Internet may have unreadable illustrations, or they may be in a separate, less accessible documents. So, does the transliteration count as a use of the word it contains, or are they mere mentions? Or are they even less than that? --RichardW57 (talk) 08:00, 25 June 2020 (UTC)[reply]

In a grammar book that uses only the Latin script, would cited examples count as usage? --RichardW57 (talk) 08:00, 25 June 2020 (UTC)[reply]

Do made up examples in grammar books count as usage if the author has a good command of the language (e.g. Jespersen writing about English in English)? I imagine there may be restrictions on using them as evidence for the word or construct being demonstrated - this restriction would be especially relevant for quoting Pali grammarians writing in Pali. --RichardW57 (talk) 08:00, 25 June 2020 (UTC)[reply]

Should I assume that someone perusing some Old Khmer text would have the savvy to recognise "qnak" as an alternative spelling of "ʼnak"? They correspond to the same Khmer script form. My feeling is that there should be a brief entry recording the former as an alternative spelling of the latter. I intend to have a gloss something like, "alternative spelling of ʼnak, Latin script form of អ្នក៑". (Note to self: This examples come from K.320N-2 in NIC II/III.) Quite what a quote for this alternative spelling should look like, I'm still unsure. And I do have candidate misspellings, such as "ʼe" for initial vowel written in Khmer script with an independent vowel, rather than a combination of glottal stop letter and dependent vowel. --RichardW57 (talk) 08:00, 25 June 2020 (UTC)[reply]

@RichardW57: The transliteration counts as a use unless you have some reason to doubt that the transliterator has misread the original text. Made-up examples do not count for well documented languages like English, but they can theoretically be used for languages with limited documentation (see WT:WDL for more on this). As for your alternative transliterations, those should be handled like Egyptian anx. —Μετάknowledge^{discuss/deeds} 22:53, 28 June 2020 (UTC)[reply]

If LOC has a copy, it's durably archived, isn't it?

As we consider Usenet groups to be "durably archived" by Google, shouldn't we also consider (or clarify) that anything LOC has a copy of is also durably archived as well as other archives? For example, het Nederlands Instituut voor Beeld en Geluid has archived copies of loads of Dutch TV and radio shows that otherwise may not be considered "durably archived" as they were only broadcast and perhaps available on streaming services. Some would have had a release on DVD (permanently recorded media), but a regular news broadcast or most game shows obviously wouldn't be.

You can request material from both of these. It's not free, it may even be cumbersome, but that shouldn't matter, right? At the Nederlands Instituut voor Beeld en Geluid, you can request material on location:

2020, Collectie Beeld en Geluid‎^[7]:

Je kunt een deel van onze collectie online bekijken en beluisteren op in.beeldengeluid.nl. Hier vind je de items waarvoor we toestemming hebben om ze online te publiceren. Ander materiaal kun je wel doorzoeken, maar alleen bekijken bij onze klantenservice in Hilversum of door het aan te vragen als download voor privégebruik.

You can watch and listen to part of our collection online at in.beeldengeluid.nl. Here you will find the items that we are allowed to publish online. You can search other material, but only watch it at our customer service in Hilversum or by requesting it as a download for personal use.

Some material can be watched online for free, most other material can either be watched for free on location in Hilversum or by requesting a copy, which will probably cost money. Still, durably archived. Alexis Jazz (talk) 11:08, 25 June 2020 (UTC)[reply]

Relying on expensive-to-inspect sources leaves us subject to fraud. Online sources that are also backed up by print (public libraries) or another durable archiving mechanism are better for that reason. When sound archives are searchable as text and sound, audio archives will be more useful. Too bad it's not going to happen soon for English materials. DCDuring (talk) 13:56, 25 June 2020 (UTC)[reply]

@DCDuring Realistically, for things like TV-shows, you can access them on streaming services (which may or may not be geo-restricted or require a subscription that many people have, like Netflix) or, if you don't mind, torrent them. LOC simply makes it conform with the "durably archived" requirement. For example, The Daily Show hasn't been released on DVD, but LOC has it. Nobody is actually going to request it from LOC, there are easier ways to watch The Daily Show, but those don't count as "durably archived". As for the fraud issue, that's an issue regardless of what we do, also on Wikipedia. Few are going to verify a source that is hidden behind a paywall or only available as a physical book. Alexis Jazz (talk) 15:04, 25 June 2020 (UTC)[reply]

How do you find the section of the citation provided that has the relevant content in the context in which it was spoken?

Are you concerned about the bias in providing such citations. When the OED was created, the editors invited submission of usage examples. They got plenty, but found that they lacked examples of relatively common uses of common words. In modern cognitive psychology it is referred to as availability bias. It is already apparent from the audio citations that we have that this bias is at work in the selection of audio citations.

Neither consideration is a reason not to use audio sources, but some more technology and some institutional change is needed to support online search of audio. Proliferation of (searchable) audio files might also allow us to provide evidence-based pronunciations. DCDuring (talk) 15:31, 25 June 2020 (UTC)[reply]

The opportunity for "attestation fraud" is reduced if the source is available online. At Wiktionary attestation is abundantly available online for most terms. Wikipedia has a more serious problem because many high-quality contemporary sources are copyrighted and are either unscanned or exist online only behind paywalls. "Fraud" is sometimes simply error or advocacy coupled with stubbornness. DCDuring (talk) 15:41, 25 June 2020 (UTC)[reply]

@DCDuring errr that's a lot of things. "How do you find the section of the citation provided that has the relevant content in the context in which it was spoken?"

Use the "time" parameter of {{quote-av}}?

"At Wiktionary attestation is abundantly available online for most terms."

But I like rare words and slang.

Media will continue to move to formats that aren't in permanently recorded formats. Websites, social media, streaming, downloads. If we count media that was archived by an institution as "durably archived", we can claw some of that back. Alexis Jazz (talk) 16:23, 25 June 2020 (UTC)[reply]

It’s not Google who durably archives. Google is a terrible company that particularly frequently abandons services. It’s the decentralization that makes durable. Circulation: If a book has been distributed, it cannot be “depublished”. Therefore also streaming services publish durably, as even if they take copy protection measures and remove things from users’ libraries if content is played on someone’s computer then one can capture it. It does not happen with websites like that that they are “in circulation”. With music and movies there are collectors if they come onto the market. Not every Soundcloud rapper produces durable content, but if the streaming services have the music it is as easily and durably accessible as a typical book at any time in the future. Fay Freak (talk) 17:29, 25 June 2020 (UTC)[reply]

Compounds in German and related languages

German has a reputation for sticking words together without spaces in between where English would use hyphens or spaces. Are there any guidelines for when such a compound deserves its own definition, if the compound is little or no more than the sum of its parts? Vox Sciurorum (talk) 21:34, 26 June 2020 (UTC)[reply]

I think Swedish, Sanskrit and Pali all beat it at this game. They too need guidelines. For Sanskrit and Pali one can partly use the lemming test - if the big dictionaries list a compound, then it is not unreasonable to create an entry for it in Wiktionary. Oddly, I don't recall any sum-of-parts challenges for Sanskrit or Pali. --RichardW57 (talk) 23:46, 26 June 2020 (UTC)[reply]

We ought to be getting issues with incorporating languages. I don't know why we don't. --RichardW57 (talk) 23:46, 26 June 2020 (UTC)[reply]

(I'm just guessing here, but I think it's probably because fewer of us are familiar enough with these languages to identify compounds that might be SOP. ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:00, 27 June 2020 (UTC))[reply]

I'd go with we have a negligible number of entries in those languages. One page claims "Only a few such languages have more than 100k speakers: Nahuatl (1.5m speakers), Navajo (170k speakers), and Cree (110k speakers)." We have 8,000 entries in Navajo and 4,000 in Classical Nahautl. With almost 750,000 entries in English and 49 languages with more than 10,000, those just don't come up much.--Prosfilaes (talk) 00:40, 27 June 2020 (UTC)[reply]

The guideline for German is WT:ATTEST. German is a WDL, so a compound needs to be verifiable though either "(1) clearly widespread use, or (2) use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year". Since German compounds are written together, they are considered single words and therefore not subject to deletion for being SOP. —Mahāgaja · talk 05:47, 27 June 2020 (UTC)[reply]
I figured that was the letter of the law, but is it good guidance for editors? I can create Körperlänge "body length" because it appears three times in apparently durably archived publications, but should I? That is one of many sum of parts compounds in the scientific paper I am looking at now. "Außenrand ballonförmig" "Vorderrand" "Vordertibia" and more just on the one page. I think I will create Aristaglied because we have a corresponding single word in English, aristomere. On the other hand, Apikalborsten translates to a two word sum of parts compound "apical bristles". Maybe I'll use that as my personal guidance: if the English version is not a literal translation of the two words with a space between, it's worth a page. Vox Sciurorum (talk) 12:16, 27 June 2020 (UTC)[reply]
@Vox Sciurorum: They're all worth a page by our current rules. After all, a German learner may not always be able to tell how to break up a compound, or whether a compound is to be interpreted literally or whether Wiktionary simply hasn't covered it yet. Now, they may not be worth your time to add, but that's a personal decision. —Μετάknowledge^{discuss/deeds} 22:48, 28 June 2020 (UTC)[reply]

The same problem exists in a limited way in English, e.g. with solid compounds formed with "-like". This can be added to virtually any noun in a purely SOP manner, yet we have no means of treating it as SOP. Mihia (talk) 21:41, 29 June 2020 (UTC)[reply]

Chinese "idiom" POS

Chinese idioms, or chengyu, or xiehouyu, are not POS. POS describes the word's syntactic function in a sentence. "Idiom" or "chengyu" fails to show the syntax, thus it is not a POS. I know some editors are eager to show they have learned a lot of chengyu or something, but the POS header is really not a good place for you to write it. They are nouns, verbs, phrases, proverbs, etc. 恨国党非蠢即坏 (talk) 05:30, 27 June 2020 (UTC)[reply]

Reworking the Beer parlour

I do not know whether this was suggested before. Request pages (RFC, RFD, RFV) have sections ordered by month in one page and they are kept in the page until they are solved or closed. The Beer parlour, compared, leaves behind unsolved discussions, by moving them to an archive page when the next month starts. I instead suggest a similar system to that of the request pages, having unsolved discussions on a page ordered by month and moving them to a page of closed discussions when they are solved or closed. Not all RFC discussions are solved within one month and likewise, discussions in the Beer parlour should also not be expected to be solved within one month. J3133 (talk) 08:44, 27 June 2020 (UTC)[reply]

You're conflating fundamentally different venues. At RFV and RFD, a decision must always be reached: the entry lives or dies. The BP, on the other hand, is similar to its sister pages, the Tea room, Grease pit, Information desk, and Etymology scriptorium. Some posts are about problems that are then fixed, or ideas that are then executed, but many can't, won't, or shouldn't be closed. Some discussions end, but stay around in order to let people know that something is going on, and some are merely notifications that serve to announce something impending. Your proposal would require a great deal more work on the part of the editors just to make the BP less usable. —Μετάknowledge^{discuss/deeds} 22:46, 28 June 2020 (UTC)[reply]

Two images in the character info template for emoji

Emoji have two presentation sequences, to display them in text presentation or emoji presentation. They do not always look the same. Unicode has two characters for specifying to display them in either a text presentation or an emoji presentation. See Wikipedia:Emoji § Emoji versus text presentation and Emoji Presentation Sequences, v13.0 on the Unicode site. Compare the current image of ☃ ("SNOWMAN") on Wiktionary and the emoji presentation (the images are on the right side). I suggest having two images, one of the text presentation and one of the emoji presentation, in the character info template, both next to each other. J3133 (talk) 08:44, 27 June 2020 (UTC)[reply]

This is an old request, but, @Kwamikagami, as you currently work on emoji pages, would you agree to this change? Perhaps {{emojibox}} could also be added to {{character info}}. J3133 (talk) 09:23, 26 February 2023 (UTC)[reply]

@J3133 I had previously requested that emojibox be integrated into character_info, though that was primarily because of alignment issues. (Emojibox creates a lot of blank space on the page.) I don't know if we'd want two images as the primary representation of the character, but I think having emojibox at the bottom of character_info would be worthwhile. kwami (talk) 10:58, 26 February 2023 (UTC)[reply]

@Kwamikagami: I have finally tried implementing this; see the sandbox version on the right side. J3133 (talk) 10:46, 13 August 2024 (UTC)[reply]

@J3133 That seems to work quite well! I've tested a few characters, and the only problem I've found is that some (e.g. ⛵) display two emoji variants. But that's something that should be addressed in the Unicode_data/images lists. That will take some time, but no rush.

Where are the second (emoji) images being pulled from? The ones I've seen all look good, but we'll want to change or update some of them. Presumably in something like a parallel Unicode_data/emoji list.

For some emoji characters we're missing either text or emoji images. Perhaps if we only have an emoji variant, we can move that to the emoji list, which would be the only one displayed, but there's the complication of roll-out to other-language wiktionaries if they only have the text-variant list. Ideally we should migrate the Unicode_data/images lists to Wikidata, so we don't need to maintain 30 local copies. That's another good suggestion that I haven't seen a comment on for a while. kwami (talk) 20:04, 13 August 2024 (UTC)[reply]

PS. Do we really need the note that "Character’s appearance may be different on each system"? That is true for all characters, even ASCII, and the reader will see their OS default variants just under the images. Personally, I'd say only "Text style is forced with ⟨︎⟩ and emoji style with ⟨️⟩." kwami (talk) 20:29, 13 August 2024 (UTC)[reply]

@Kwamikagami: I had copied the note from {{emojibox}}; now trimmed to only the text you have suggested. The emoji-style modules are Module:Unicode data/images/002/emoji and Module:Unicode data/images/003/emoji (i.e., the text-style modules suffixed with “/emoji”). The sandbox version is now the current one. J3133 (talk) 01:02, 14 August 2024 (UTC)[reply]

Great! I'll start removing the now-redundant emojiboxes. kwami (talk) 01:05, 14 August 2024 (UTC)[reply]

@J3133: For ㊗, we have an image in the emoji list but not in the text list, and nothing displays in the character box. Could you fix so that the emoji img will display even without the other? (I don't want to mess with your code.)

Possibly it could combine the cells for the images into one, so there isn't a blank cell, but still keep both text and emoji displays of the user's font underneath. I don't know if that would be worth the effort. kwami (talk) 01:32, 14 August 2024 (UTC)[reply]

@Kwamikagami:

Done. I have also added the modules (renamed to Module:Unicode data/emoji images/002 and Module:Unicode data/emoji images/003) to the table at Module:Unicode data per your request. J3133 (talk) 01:55, 14 August 2024 (UTC)[reply]

Perfect!

I'll ping you if I come across any other bugs. kwami (talk) 01:58, 14 August 2024 (UTC)[reply]

@Kwamikagami: Unicode appendices now also show both images (example). J3133 (talk) 03:09, 14 August 2024 (UTC)[reply]

That's helpful. Thanks. kwami (talk) 03:25, 14 August 2024 (UTC)[reply]

Once we get up into the supplementary plane (1Fxxx), are those all emoji-only, so we don't need to worry about text variants? I understand that Unicode is no longer accepting dual-use characters, but don't know where the cut-off is. kwami (talk) 03:40, 14 August 2024 (UTC)[reply]

@J3133: dedicated emojis like 🍅 also display with distinct forms in the emojibox. They're unlikely to have parallel images at Commons, but it might still be worthwhile to move them to a 1Fxxx emoji list, so that the character info box will show both variants as text under the single image. kwami (talk) 04:43, 14 August 2024 (UTC)[reply]

@Kwamikagami: I have made the boxes for all characters from U+1F170 to U+1FAFF (from 🅰 (U+1F170 NEGATIVE SQUARED LATIN CAPITAL LETTER A) to currently 🫸 (U+1FAF8 RIGHTWARDS PUSHING HAND); the cut-off is 🬀 (U+1FB00 BLOCK SEXTANT-1), the first character of the Symbols for Legacy Computing block) to show both styles automatically. Should the Alchemical Symbols, Geometric Shapes Extended, and Supplemental Arrows-C blocks (U+1F700–U+1F8FF) be excluded? Also possibly the Ornamental Dingbats block (U+1F650–U+1F67F) and the first characters of the Supplemental Symbols and Pictographs block. J3133 (talk) 05:44, 14 August 2024 (UTC)[reply]

@J3133:: AFAICT, the characters at List of emojis should be included (except perhaps for punctuation marks, math operators, the copyright sign and the like, where IMO it would seem rather silly). But i.a. the alchemical symbols don't have emoji variants. In fact, some of the more recently accepted ones were specifically encoded as not being emoji, because Unicode doesn't want to encourage dual use. So for example at ⌛, the top two boxes should have variants, but not the bottom one that currently does. Similarly at ☠, AFAICT it's the white skull at top that has the emoji variant, not the black skull at bottom. For the basic-alphabet enclosed alphanumerics, it's only the ones for blood types that have emoji variants, not e.g. 🅲. (Like for planetary symbols, where it's only Mars and Venus because of their use for male and female. That block looks good.) kwami (talk) 07:14, 14 August 2024 (UTC)[reply]

P.S. I don't see your edits where you "made the boxes", so I don't know how to contribute. kwami (talk) 07:22, 14 August 2024 (UTC)[reply]

@Kwamikagami: All characters at List of emojis now have both styles displayed. J3133 (talk) 08:46, 14 August 2024 (UTC)[reply]

@J3133:: Looks good.

In the BMP, only ⏩ is missing. Checked about 1/3 of the SMP; good so far. kwami (talk) 10:43, 14 August 2024 (UTC)[reply]

@Kwamikagami: Fixed. J3133 (talk) 10:48, 14 August 2024 (UTC)[reply]

@J3133:: If something like that needs adjustment, and you're not around, where would I go to do that? kwami (talk) 10:52, 14 August 2024 (UTC)[reply]

@Kwamikagami: See Module:character info, line 149. J3133 (talk) 11:00, 14 August 2024 (UTC)[reply]

Thanks. kwami (talk) 11:05, 14 August 2024 (UTC)[reply]

ThisWordDoesNotExist.com

Anyone else seen this? [8] It generates semi-plausible words and definitions by mashing things together according to a machine-learning algorithm. One result I got was "nonconcluded", which actually is a word, and it almost produced a correct definition for it too. Equinox ◑ 21:42, 28 June 2020 (UTC)[reply]

@Equinox This is inspired by [9], which uses generative adversarial networks. Benwing2 (talk) 00:56, 29 June 2020 (UTC)[reply]

Just added a new word "uncoinability" but it failed to give a sensible definition. meh. This is how it defined "greater quantity, quantity, or quality of property considered equivalent owing to a greater degree of the originality greater than it contains".

It is fun though thinking of new nonexisting words xD. Dixtosa (talk) 12:00, 29 June 2020 (UTC)[reply]

Just added "brute-foreseeable" - (of a cryptographical problem) able to have an easy brute-force algorithm predicted. Oh I am liking it :D. Dixtosa (talk) 12:07, 29 June 2020 (UTC)[reply]

Category:Ukrainian affectionate terms vs. Category:Ukrainian endearing nouns

@Atitarev I just noticed that both these categories exist. They convey the same idea and should be unified. What name should we use, and should it include the part of speech? I originally added the support for "Foo endearing nouns" on the analogy of "Foo diminutive nouns"; in Slavic languages, endearing terms are common and heavily overlap with diminutives. Benwing2 (talk) 00:47, 29 June 2020 (UTC)[reply]

@Benwing2: I would merge with endearing terms, also for Russian. Not all diminutives are endearing terms but many endearing terms are also diminutives. --Anatoli T. ^{(обсудить}/^вклад) 00:51, 29 June 2020 (UTC)[reply]

@Atitarev What should the name of the resulting merged category be? I'd like to fix this for all languages. Benwing2 (talk) 00:53, 29 June 2020 (UTC)[reply]

@Benwing2: Sorry, I meant to say use "endearing" categories. --Anatoli T. ^{(обсудить}/^вклад) 00:57, 29 June 2020 (UTC)[reply]

"Endearing terms" sounds weird...like the term itself is endearing. I would call them "terms of endearment", and if further specification is necessary, "terms of endearment (nouns)". Andrew Sheedy (talk) 03:07, 10 July 2020 (UTC)[reply]

Vote on eye dialect label

This has now had more than enough time for comments. I wonder whether an Admin could now put it live. (Or can I just do this myself?) Mihia (talk) 11:39, 29 June 2020 (UTC)[reply]

@Mihia: The vote is live. (You could do it yourself, no need to be an admin for that.) P U C – 18:43, 1 July 2020 (UTC)[reply]

Archive all the urls of a website to WaybackMachine (https://web.archive.org/) so that it can always be used for references

The Oxford English Dictionary website offers the entries of the dictionary online, just by counting numbers in the final numerical expression in its URL: https://www.oed.com/oed2/00000001, https://www.oed.com/oed2/00000002 ... the last url being https://www.oed.com/oed2/00291601.

Surprisingly enough, https://web.archive.org has not yet archive them all, and I'd love to know how I can do/request so . That archive could then be used for references.

I've found a possible solution for Linux, so but how can it be implemente on Windows 7? https://webapps.stackexchange.com/questions/115369/how-to-archive-the-whole-website --Backinstadiums (talk) 15:10, 29 June 2020 (UTC)[reply]

Archiveteam probably knows how. IDK if they work with websites that will not disappear soon, though. —Suzukaze-c (talk) 21:37, 29 June 2020 (UTC)[reply]

Wiktionary:Criteria for inclusion/Editnotice

What is this page? Ultimateria (talk) 17:23, 29 June 2020 (UTC)[reply]

Originally meant to be a template replacing {{policy}}, transcluded from WT:CFI. I don't see how it would have been an improvement, and as far as I can tell it has never been deployed. I think it can safely be deleted. --Lambiam 16:06, 1 July 2020 (UTC)[reply]

Is there a procedure for getting this useless page deleted without having a vote? --Lambiam 17:04, 17 July 2020 (UTC)[reply]

Nominate it at Wiktionary:Requests for deletion/Others. Chuck Entz (talk) 17:51, 17 July 2020 (UTC)[reply]

Thanks, so done: see Wiktionary:Requests for deletion/Others#Wiktionary:Criteria for inclusion/Editnotice. --Lambiam 21:04, 17 July 2020 (UTC)[reply]