Wiktionary:Beer parlour/2016/May

Snowclones

Is there any particular reason why the snowclones are in appendices? I would have though they fit the main namespace nicely, except the "X" and "Y" in the page titles are a little odd.

For reference, here are all the current snowclone pages:

Related terms

Also I created 1 page for a Portuguese snowclone today (by moving it from homem com H maiúsculo, which I had created years ago):

Appendix:Snowclones/X com Y maiúsculo

Note: I edited all the snowclone pages to make them use the normal entry layout. For example, many had a "Origin" section which I renamed to "Etymology". Example diff: link. --Daniel Carrero (talk) 04:40, 2 May 2016 (UTC)[reply]

It's precisely because of the X's (and Y's and Z's) that they are in the appendix. --Wiki Tiki 89 17:00, 2 May 2016 (UTC)[reply]

Who would look up I am (something), hear me (do something)? I would like to see how this could be made into something that was demonstrably useful. If not, the Appendix is fine. DCDuring TALK 17:27, 2 May 2016 (UTC)[reply]

If @Wikitiki89's comment is correct, snowclones are in appendices because they differ from the usual naming format. Aren't they normal entries otherwise, with a semantic, idiomatic meaning that is not understood just from their component words? Then, they should probably be named using the format Snowclones/I am X, hear me Y, like we have Unsupported titles/Less than three. That is, we should keep them in the main namespace as normal entries, I believe. --Daniel Carrero (talk) 18:52, 19 October 2016 (UTC)[reply]

The main namespace technically does not support subpages, and even if it did, such a naming would imply that "I am X, hear me Y" is a subtopic of the word "Snowclones", which is not the case. What we do it for unsupported titles is probably not the best practice. There are a lot of quasi-lexical things that don't have usable lemma forms, such as the null plural indefinite determiner in English and other Germanic languages, or word patterns in Semitic languages (we have appendix entries for these in Hebrew, such as מִקְטָל and קֹטֶל, and wouldn't it be nice to have pages for Arabic plural patterns such as أَفْعَال (ʔafʕāl) and فُعُول (fuʕūl)?). This is what the appendix is for really, so we don't have to cram these things awkwardly into the main namespace. --Wiki Tiki 89 19:02, 19 October 2016 (UTC)[reply]

I understand that there are a lot of quasi-lexical things that don't have usable lemma forms. I think it's like how I created Appendix:Capital letter for a certain concept that does not fit in the main namespace. I disagree with this: "What we do it for unsupported titles is probably not the best practice." I mean, it's the best of the two possibilities that have been discussed, as far as I'm aware: between 1) having Appendix:Unsupported titles/Less than three and 2) having Unsupported titles/Less than three, the latter is the best we have. I'm open to other suggestions, I suppose.

Wouldn't it be preferable moving snowclones into the main namespace under names that don't use subpages at all, like I am X, hear me Y? It's not like the X and Y are too criptic to understand. I might be mistaken, but as far as I remember, exactly all of the current snowclone entries in all languages have usage examples and/or quotations, so their uses should be clear. Nobody is going to actually say in running text: "I am X, hear me Y!!!" --Daniel Carrero (talk) 19:12, 19 October 2016 (UTC)[reply]

But who would think to search for "I am X, hear me Y"? Anyway, the real question is what problem are you trying to solve by moving these from the appendix to the main namespace? What is the disadvantage of them being in an appendix? --Wiki Tiki 89 19:32, 19 October 2016 (UTC)[reply]

I don't care all that much, because our coverage of snowclones is poor. Maybe if we had a few hundreds of snowclones in multiple languages, I'd feel more inclined to work on the proposal of moving them all to the mainspace. But, to answer your question, it bothers me that by placing them in appendices, snowclones are treated as second-class citizens when they are by definition common, well-known phrases. If search does not work so well, they can link to each other. And their existence in the mainspace would hopefully encourage other editors to increase our coverage of snowclones. --Daniel Carrero (talk) 19:53, 19 October 2016 (UTC)[reply]

Actually, I think I'm going to create a vote for this, eventually. A RFM won't do, because this is a suggested change in the naming system of snowclones. --Daniel Carrero (talk) 00:30, 21 October 2016 (UTC)[reply]

Definitions

Previous discussion: User talk:Romanophile#Latin

Can we add definitions to forms of various words. This would allow definitions of words to be more easily accesible, especially when the internet is slower. Bman1230 (talk)

We already give definitions to forms of words. For example, chairs is defined as the plural of chair. —CodeCa t 18:54, 2 May 2016 (UTC)[reply]

I mean include some info on derived terms page. Generally if someone knows what a chair is, they dont need to look up the definition for chairs. /n Maybe something like this. (Imagine better formatting) Bman1230 (talk)

That would duplicate all the information on every form-of page, and it would be a nightmare to maintain. —CodeCa t 21:39, 2 May 2016 (UTC)[reply]

I was thinking maybe there would be some way to load info from the other page, it would probably have to be changed by wiktionary. Bman1230 (talk)

It would be "easy" enough to substitute plural forms for the singular forms in the definitions, but one would also have to make sure that there was number agreement with any verbs, pronouns, or other nouns that required it. Either AI or extensive tagging would be required, I think. DCDuring TALK 23:35, 2 May 2016 (UTC)[reply]

To me it seems like in the majority of cases "An item of furniture used to sit on or in comprising a seat, legs, back, and sometimes arm rests, for use by one person." is a more useful definition than "Plural of chair" despite the disagreement of number"Bman1230 (talk)

would a mouseover system like this work?Bman1230 (talk)

Noun

chairs

plural of chair

Not as text. Someone would have to enter the text, and any edit to the entry for chair would make the two entries different. I suppose you could transclude the entry at chair into the title tag, but there are limits to how much you can put in a title tag (I only see up to the first half of item 6 in your example). There are also all kinds of complications such as multiple etymologies (wound and winded are past tense forms of different words spelled as wind and wounded is the past tense of wound- but not of the past tense of wind) that would mean you would have to set things up carefully in the main entry so that only the relevant part would be transcluded- which would be subject to getting fouled up whenever someone rearranges the main entry

The Achilles' heel of anything you could come up with is the inability to predict or control what is done to either or both entries after you've set everything up- people add, delete, rearrange and otherwise mess with just about every aspect of entries all the time. Multiply that times thousands and thousands of main entries, and it quickly becomes impossible to maintain. Chuck Entz (talk) 02:15, 3 May 2016 (UTC)[reply]

Appendix:Repetition

FYI, I created Appendix:Repetition. It feels to me that this is a concept with verifiable semantic value, like Appendix:Capital letter. That's why I formatted it like an entry, too. --Daniel Carrero (talk) 19:47, 3 May 2016 (UTC)[reply]

In linguistics, reduplication is considered a kind of affix, but its meanings are language-specific, not translingual. In Indonesian, for example, reduplication is used to form plurals. In some languages, it's used to indicate a diminutive; in others, a wide variety or a large number. In older Indo-European languages (and Proto-Indo-European itself) reduplication of the initial consonant or consonant cluster of a verb root is used to form the present stem of some verbs as well as the perfect stem of most verbs that have a perfect stem. I like the idea of having an entry for the reduplication morpheme, but I doubt it should have a Translingual section. —Aɴɢʀ (talk) 21:11, 3 May 2016 (UTC)[reply]

Also, don't confuse reduplication with lengthening. Daaaaaniel is an example of lengthening of the vowel, which in writing is indicated by repeating a letter. --Wiki Tiki 89 01:59, 4 May 2016 (UTC)[reply]

Disclaimer: This is just a first draft, I wouldn't mind having a separate language section for plural-forming use in Indonesian that @Angr mentioned, or explaining any better the difference between lengthening of the vowel and actual reduplication that @Wikitiki89 mentioned. --Daniel Carrero (talk) 03:37, 4 May 2016 (UTC)[reply]

Tagging entries missing a headword template

There are still some entries without a headword template that linger around here and there. I would like to tag these for cleanup, using the template {{rfc-head}}. For the bot, I'll use a relatively simple heuristic to determine if a part-of-speech section is missing a template. If the header is not immediately followed, on the start of the next line, by any template, then insert the cleanup template at the start of that line. This may result in false negatives (entries without a headword template that aren't tagged) but it shouldn't give any false positives that I can think of.

To determine which headers are part-of-speech headers, I'll use a precompiled list of all the headers I've come across in a dump, and (painstakingly) split between POS and non-POS headers. If a POS header is used erroneously, like say a Verb header where it's used as something other than a POS header, it will also be tagged as missing a template (false positive), but since we are presumably going to fix the tagged entries manually, the erroneous usage will be noticed during this process.

The bot run will not actually fix any entries, but it should be relatively easy to fix some of the entries with a bot, once they are tagged. —CodeCa t 21:07, 5 May 2016 (UTC)[reply]

It wouldn't surprise me to find a large number of entries with a file ("[[File:" or "[[Image:") on the line right after the POS header, before the headword template. I consider this suboptimal, but it'd be a false positive in your work, I think. Other than that, this sounds like a good idea. Many other entries put {{wikipedia}} after the POS header, before the headword template, but this won't affect the script you describe. - -sche (discuss) 05:31, 6 May 2016 (UTC)[reply]

On a related note, there are a large number of Latvian adjectives with two headword-template lines but only one POS header, as described here, complete with proposed bot fix. - -sche (discuss) 05:31, 6 May 2016 (UTC)[reply]

Similar careful logic could be used to revert the erroneous mass insertion of {{l}} with inappropriate language codes. DCDuring TALK 10:57, 6 May 2016 (UTC)[reply]

I oppose such tagging since they are fairly easy to identify for anyone being serious about making the list shorter. I generally oppose tagging without fixing, especially in huge volumes. --Dan Polansky (talk) 08:19, 7 May 2016 (UTC)[reply]

If done, I suggest it is done "quietly", e.g. adding entries to a category but not having red warning text showing up in the entry. Equinox ◑ 11:25, 7 May 2016 (UTC)[reply]

I fully support the proposal, with the suggestion that it'd be nice if the script automatically recognized [[File: and [[Image: if possible. --Daniel Carrero (talk) 23:26, 7 May 2016 (UTC)[reply]

vCat

vCat is a tool on wmflabs that can graphically display category structures. More specifically it can show all ancestor categories (that is, parents and parents' parents and so on) or all descendants. We could have two links for the generated parents and children images on category pages.

Look what it generates for ka:Sports - descendants, Georgian language - ancestors.

I do not have very strong opinion about this but others may have. --Giorgi Eufshi (talk) 09:45, 6 May 2016 (UTC)[reply]

Gothic words that are attested only in Runic inscriptions

Was wondering whether a Runic or Gothic script entry should be created for Gothic words that are attested only in Runic inscriptions. There are not many words of this sort which have one mostly undisputed reading, but there certainly are some. When I added ᚱᚨᚾᛃᚨ (ranja) a while ago I had not yet found any others of this sort, so I went ahead and added the entry in Runic script, the word only being attested in that script. However, recently I came across 𐌷𐌰𐌹𐌻𐌰𐌲𐍃 (hailags), which is also attested only in one Runic inscription (Wulfila, interestingly, prefers 𐍅𐌴𐌹𐌷𐍃 (weihs) to mean holy - if anyone knows why, let me know!) but was added by another user in the Gothic script. Which are preferable here, Gothic or Runic lemmata? Would be interested to hear your thoughts. — Kleio (t · c) 17:38, 11 May 2016 (UTC)[reply]

Anything that's attested can be added, so there's no preference. However, Gothic doesn't currently have Runic listed as one of its scripts, so autodetection won't work. —CodeCa t 17:47, 11 May 2016 (UTC)[reply]

It does now. --Wiki Tiki 89 17:58, 11 May 2016 (UTC)[reply]

Good call! — Kleio (t · c) 18:15, 11 May 2016 (UTC)[reply]

But if there is no preference, could you not end up with two lemma entries, or, well, situations like this, where it is inconsistent and feels messy? Seems to me it is best to settle for one lemma (imo for the attested script), with, if the attestation is only in a rare script like Runic, a redirect in the Gothic script à la the romanization of redirects we have for most regular Gothic entries. — Kleio (t · c) 18:15, 11 May 2016 (UTC)[reply]

When there are multiple lemmas that represent the same basic word, we call them alternative forms. So we can call the runic lemma an alternative form of the gothic lemma. —CodeCa t 18:26, 11 May 2016 (UTC)[reply]

But the Gothic script forms are not attested in these listed cases. Should there not, in any case, be a consistent approach to what form in these cases (got words only attested in Runic) should be the main entry, and which should be considered the alternative form? Because if I understand you correctly, whoever creates the lemma for this kind of Runic-only Gothic word would decide whether to make the main entry a Runic or Gothic script one, and the other script would then be considered the alternative form. We would, then, have 𐍂𐌰𐌽𐌾𐌰 (ranja) as an alternative form of the lemma ᚱᚨᚾᛃᚨ (ranja), and ᚺᚨᛁᛚᚨᚷᛊ (hailags) as an alternative form of the lemma 𐌷𐌰𐌹𐌻𐌰𐌲𐍃 (hailags) - a completely opposite way of dealing with this, despite them being both attested only in Runic. I might be a bit OCD about this, but it just seems messy to leave that up to an arbitrary decision by the editor, instead of having a consistent approach. — Kleio (t · c) 18:44, 11 May 2016 (UTC)[reply]

Chiming in from the sidelines, I agree that messy and inconsistent is undesirable. My 2p would be to have all Gothic lemmata in the Latin script, with notes in the etymologies to indicate if a given term is only attested in Runic. Gothic entries in the Runic script would all be soft redirects to the Latin-script entries, much as we have for pinyin entries for Chinese, or romaji entries for Japanese, etc. We have a possible analogous precedent in the handling of Pali entries, which historically were not written in the Latin script until relatively recently, but for which (I think) all EN WT lemmata are given in the Latin script. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:55, 11 May 2016 (UTC)[reply]

Why should we do it for this one non-Latin language and not for all others? Korn [kʰũːɘ̃n] (talk) 19:45, 11 May 2016 (UTC)[reply]

Admittedly, most contemporary students of Gothic and indeed virtually all books published on Gothic use the Latin script, unlike for example Ancient Greek where the Greek alphabet is consistently used. See also this discussion and the past votes that are linked there. Generally speaking though, the issue of Latin script usage, romanization and so forth seems to be a bit of a mess. — Kleio (t · c) 20:01, 11 May 2016 (UTC)[reply]
@Korn [kʰũːɘ̃n] -- if, by "this", you mean "we should use the Latin script for all lemmata, for all languages", that presents some real organizational difficulties for some languages. For Japanese, have a look at [[せい]] (sei) -- the sheer number of homophones is overwhelming, and our entry doesn't even include all of the entries applicable to this phonetic rendering. Were we to move all of the applicable lemmata to a single heading under [[sei#Japanese]], we would have a substantial challenge in organizing and presenting all of this information in a useful fashion. This is a big part of why Japan has not retired kanji, the borrowed Chinese characters used in writing: these provide much-needed disambiguation. Written and spoken Japanese can be quite different in terms of style and vocabulary, largely because of the writing system. ‑‑ Eiríkr Útlendi │^{Tala við mig} 01:02, 12 May 2016 (UTC)[reply]
We should record all languages in the language that they are printed in. That is, for Gothic, the Latin script. I see no reason we should be messing around with manuscript traditions at all.--Prosfilaes (talk) 08:08, 12 May 2016 (UTC)[reply]

Because that is what the Goths wrote their language in and because we rely on primary sources where we can rather than copies from non-native speakers who take editorial liberty. And I'm not saying we should Latin for all lemmas, I'm saying when something gets special treatment it needs a two way justification for why it is done in this of all cases and why it is not done in the others. Korn [kʰũːɘ̃n] (talk) 13:33, 12 May 2016 (UTC)[reply]

The Gothic script is what Bishop Ulfilas wrote Gothic in, at least. We should rely on printed copies because printed copies are normalized script-wise and can be reliably transcribed and searched for, whereas
Pepys' Diary
Pepys' diary as-is can't, and because printed copies are what the users of Wiktionary are going to be using, not the manuscripts.--Prosfilaes (talk) 07:51, 13 May 2016 (UTC)[reply]

Ad absurdum: If non-native speakers of Russian would start using exclusively romanised versions of Russian books, would that affect your stance on where to lemmatise Russian? (Assuming that it wouldn't won't, that would be the point for giving the argument for the exception that I asked for.) Korn [kʰũːɘ̃n] (talk) 19:20, 14 May 2016 (UTC)[reply]

ps.: Shorthand diary writing is not meant for consumption by others. I hope we can all agree that our decisions on writing should be made on the basis of those texts which were actually meant to be read by people. Korn [kʰũːɘ̃n] (talk) 19:22, 14 May 2016 (UTC)[reply]

pps.: We have romanisations. The argument to be made is why they should be lemmas rather than links to Gothic as was used by Goths. Korn [kʰũːɘ̃n] (talk) 08:22, 15 May 2016 (UTC)[reply]

If the speakers of Russian started using exclusively Latin script for reading Russian, we should record Russian in the Latin script. If Russian in Cyrillic script was found only in museums, and most universities had only examples of Russian in Latin script, then yes, we should record Russian in the Latin script. All of the speakers of Gothic who might use Wiktionary, including all the non-existent native speakers, use Latin to write it.
I think editorial liberty is a red herring. Latin transcription of Gothic is letter for letter. Whether we cite Gothic in Latin script or Gothic script makes no difference to the accuracy of the original. Whether or not a text is meant to be read by other people doesn't change the fact that bringing Pepys' Diary in a Latin script changes it more than transliterating any Gothic work.--Prosfilaes (talk) 06:59, 17 May 2016 (UTC)[reply]

French capital letters with diacritics

Is diff correct? If so, the usage note should be changed or removed from all entries; there's no sense changing just one entry. - -sche (discuss) 22:14, 11 May 2016 (UTC)[reply]

The usage note should be kept, but modified to provide more information. I don't know the details, but I'm pretty sure the practice varies by country and that the common practice in France of dropping accents on capitals originated from typewriters not having keys for the accented capitals. I wouldn't be surprised if the Academy recommends keeping the accents. --Wiki Tiki 89 22:32, 11 May 2016 (UTC)[reply]

It is a general typographic rule to keep the diacritics on capital letters. Some bad newspaper title errors arose from the lack of diacritics, e.g. « UN POLICIER TUE »: is it tue (" A policeman kills") or tué ("A policeman killed")? It only remains a common issue because current French keyboards can't write capitalized letters with diacritics easily (at least on Windows), and some people can't be bothered to learn how to use diacritics properly. — Dakdada 09:25, 12 May 2016 (UTC)[reply]

I'll also add that when I started taking French in middle school, our French teacher actually taught us that we were supposed to drop the diacritic on capital letters (although, I personally found it illogical and refused to comply). --Wiki Tiki 89 15:02, 12 May 2016 (UTC)[reply]

@Wikitiki89 My own high school French teacher had been telling students that diacritics were optional on capitals. I didn't believe it either. Hillcrest98 (talk) 23:33, 15 May 2016 (UTC)[reply]

I've read a novel or two in French that did not contain a single accent on a capital letter, so it's certainly optional. The only diacritic that isn't is the cedilla, and even that gets dropped sometimes. Andrew Sheedy (talk) 04:40, 16 May 2016 (UTC)[reply]

Before the age of computers and Unicode fonts, it was usual practice that French Canadian retained diacritics on all-caps, while European French preferred caps without diacritics. I have not kept up with trends since 2000. —Stephen ^(Talk) 09:17, 22 May 2016 (UTC)[reply]

Not only French? It was said for a long time for Spanish, but I checked the RAE recommendation: only acronyms and siglation don't take graph acc point 7, at the end. Sobreira (talk) 09:31, 27 May 2016 (UTC)[reply]

Creating standards for GML

Over here, During said out that I shouldn't just decide to put the lemma of Middle Low German words on their attested rather than normalised form without discussion. To my knowledge, I'm the only current editor of Middle Low German, so discussion honestly didn't occur to me, despite my Wiktionariandom. So here it is, go forth and discuss
The marking of umlauts happens in the early MLG period with ø, y and slashed u (see also this question), as well as digraphs, but for the longest part of the period is so overwhelmingly absent that the leading authority of the 19th century (Lübben) was stoutly convinced that umlaut didn't occur in the language. The next standard work on the language (Lasch) does prove him wrong, but points out that "ü is hardly/likely not, ö rarely to be taken as an umlaut" ("ü ist wohl kaum..."), and the examples she gives for ö are actually spelled oe, without superscript. So following our conventions for Latin, I figured to change the lemma from e.g. vögen to vogen. Korn [kʰũːɘ̃n] (talk) 23:15, 11 May 2016 (UTC)[reply]

Just to make it clear: you would have the lemma entry be at [[vogen]], but with the inflection line and conjugation table on that page showing vögen. Would [[vögen]] be an alternative form entry for vogen? DCDuring TALK 00:42, 12 May 2016 (UTC)[reply]

You understood me correct, yes. The circumflex and the trema are a modern scholarly annotation for clarity, standardly applied like macron to Latin texts. I would say that anything attestable can be an alternative form, anything unattested can not. From what I understand from the grammars (I don't have access to corpora myself), the rendition of an umlaut as Ö and Ü was generally unknown in the period, though, and can be expected to be unattestable. Korn [kʰũːɘ̃n] (talk) 13:30, 12 May 2016 (UTC)[reply]

Hence your analogizing to how we handle Latin macrons etc. rather than to how we handle German tremas, which is what I thought the analogy would be. If I actually knew anything substantive in this area, I probably would not have resorted to procedure. I suppose a discussion, preferably with more than just the two of us participating, my contribution being minimal, would be good for Wiktionary talk:About Middle Low German to memorialize the decision. DCDuring TALK 15:09, 12 May 2016 (UTC)[reply]

I support the lemmas being the non-diacriticized forms. I wish we would do this with Ancient Greek as well. --Wiki Tiki 89 15:15, 12 May 2016 (UTC)[reply]

It's this way too in many cases for Middle English, where "u" ( = /y/), is nowadays oftentimes written "ü" to clarify the pronunciation, but "ü" was never actually used in Middle English orthography (tmk). Middle English "u" could also represent /u/. I would support the same for gml Leasnam (talk) 15:39, 13 May 2016 (UTC)[reply]

I am copying this debate to Wiktionary talk:About Middle Low German. Please give further input there. Korn [kʰũːɘ̃n] (talk) 19:27, 14 May 2016 (UTC)[reply]

Standard layout of adjective tables?

For inflection tables of languages with cases, our normal practice is for the singular to appear in one column, then the plural in another column to the right. For languages with a dual, like Slovene, there are three columns. But when a language also has gendered adjectives, there are two dimensions to the table: number and gender, giving a total of 9 combinations in the case of dual languages, and 6 for most others. There's different ways to make the layout of adjective inflection tables in this case:

Put them all into one row, singular in all genders first, then plural in all genders. This is what we use for Latin, Russian, Polish, German and it tends to get rather wide especially in the Latin case. Not really good for mobile.
Put them all into one row, masculine in all numbers first, then feminine in all numbers, etc. This is apparently what we use for Proto-Germanic. The downside to this layout is that it doesn't keep singular and plural forms together, which often resemble each other in different genders.
Have number distinguished by row, and gender by column. This is used for Serbo-Croatian and Slovene.
Have gender distinguished by row, and number by column. This is essentially three noun inflection tables stacked on top of each other, so it's more consistent in that way. However, for languages with two numbers, you end up with 3 rows and 2 columns, which gives a rather tall table without using the width much. This is definitely the best layout for mobile though, for this reason.

My question is whether there is a preferred layout for these. Specifically, what should be used for the Proto-Indo-European adjective table I intend to develop? PIE has three numbers and three genders, so it would be either a 3x3 table or a table with 9 (!) columns. —CodeCa t 20:54, 13 May 2016 (UTC)[reply]

I wonder if there's a way to create tables like these with some CSS magic so that the ultimate layout is determined by the browser based on window size. Anyway, I think without space considerations, having cases on one side and everything else on the other makes the most sense to me, because each number/gender combination can sort of be interpreted as its own sub-lemma with its own declension. --Wiki Tiki 89 21:20, 13 May 2016 (UTC)[reply]

But would the columns be the numbers, or the genders? —CodeCa t 21:29, 13 May 2016 (UTC)[reply]

To clarify, if screen space is not a problem, then I would prefer if cases were rows and numbers and genders were columns. If that makes the table too wide, then you can make the numbers sort of independent tables like for Serbo-Croatian and Slovene. --Wiki Tiki 89 21:38, 13 May 2016 (UTC)[reply]

So you think the columns should be the genders, then? —CodeCa t 22:04, 13 May 2016 (UTC)[reply]

Not necessarily. I think the columns should be gender/number combinations. How those combinations are arranged depends on the particular set of combinations that a language has. --Wiki Tiki 89 22:08, 13 May 2016 (UTC)[reply]

There's 9 combinations for PIE... —CodeCa t 22:27, 13 May 2016 (UTC)[reply]

First of all, there should be no standard layout, since different languages have different considerations, and there are differences in lexicographic traditions and educational standards. The community of editors for a specific language should be allowed to decide on the layout that's best for their languages. Consider also that, once you leave the more archaic post-Anatolian Indo-European languages, you're not going to be able to use any of this consistently across languages.

Now, as far as Indo-European, you have no strong lexicographic traditions to deal with, so you can start from scratch. I think your sorting should be gender first, subdivided by number:

The PIE genders are more like separate declension classes, coming as they do from different origins.
Morphologically, gender morphemes (if you can call them that) tend to occur between the root and the number endings, which makes them more basic- you have a derived stem, to which all the number endings are added.
Semantically, gender is more closely tied to the identity of the referent: numbers can be changed by combining referents in groups, but each of those referents will still have the same gender. That makes gender a more basic category: when you're talking about something you will use the same gender for it in all the different forms you use to talk about it, so you will want to have all the forms for that gender in the same place (of course, something feminine can become grammatically masculine by being grouped with something masculine, but it's still intrinsically feminine).

Of course there are also aspects where number is more basic, and there might be some way to minimize or hide the dual columns if they're next to each other, which would save space in the basic display (the dual number is rather secondary, in many ways, and has mostly disappeared in the daughter languages).

As for horizontal vs vertical arrangements: I wonder if there's any way to get the groups of columns to wrap as a block instead of by line? That is, the neuter singular, dual and plural, with all of their cases, would be a separate table that would move below the tables for the masculine and feminine with their respective numbers and cases if the page wasn't wide enough, and the feminine table would move between the masculine and the neuter if the page was only wide enough for one gender block. If you could do that, you would have the both the horizontal arrangement for wide screens and the vertical arrangement for narrow screens. In the vertical configuration, it would be somewhat like the arrangement at Sanskrit विशाल (viśāla), to give an arbitrary example. Chuck Entz (talk) 03:05, 14 May 2016 (UTC)[reply]

I cast my lot with "let the editors decide". Korn [kʰũːɘ̃n] (talk) 19:46, 14 May 2016 (UTC)[reply]

I am an editor. I'm asking others to decide. :p —CodeCa t 19:54, 14 May 2016 (UTC)[reply]

Wir sind das Editor! Wir sind das Editor! - Rephrasing my stance: I have no problem with a non-uniform layout across different languages and thus think discussion of the respective tables should take place in the communities of those actually working on the language. Coming from a Germanic language, for Germanic languages I prefer this way: With a uniform plural it should be case by row, rest by column. That is male/neuter/female/plural × cases ordered NAGD. Languages with gendered plural should follow the same pattern but have two separate tables for singular/plural. Korn [kʰũːɘ̃n] (talk) 08:38, 15 May 2016 (UTC)[reply]

The most commonly spoken languages (e.g. German) have a single plural form for all genders, so they need 4 columns, not 9. How many languages need more than 4 columns? And how many words do we currently cover in such languages? Are the weird cases few or many? -- LA2 (talk) 19:35, 15 May 2016 (UTC)[reply]

Many Slavic languages preserve distinct plural forms for each of the genders. Slovene is an extreme case because it has dual forms too, so it needs 9 columns (see dober), but Serbo-Croatian is much more widely spoken and needs 6 columns (dobar). The two Baltic languages have no more neuter gender, but the plurals of the two genders remain distinct, so there are 4 columns (geras). Icelandic, a Germanic language, does not have full syncretism of the genders in plural, so 6 columns remain necessary (góður). —CodeCa t 20:36, 15 May 2016 (UTC)[reply]

Appendix:Possessive

I edited the entry my to replace the small list of 4 senses with a link to the more complete list of senses at Appendix:Possessive. See diff. What do you think?

First, I moved User:Msh210/English possessives to Appendix:Possessive and edited the page further. Previous discussions: Wiktionary:Tea room/2011/April#English possessives, Wiktionary:Beer parlour/2016/January#English possessives. What do you think, @msh210?

I'd like to do the same for other entries at some point, like your, etc. --Daniel Carrero (talk) 02:17, 14 May 2016 (UTC)[reply]

The Appendix should be a supplement not a replacement. I'd simply revert the elimination of the definitions and add a reference to the Appendix under See also. DCDuring TALK 11:57, 14 May 2016 (UTC)[reply]

Whoa. "A possessive used before a noun" doesn't distinguish my from other possessives like your and her. Equinox ◑ 11:42, 15 May 2016 (UTC)[reply]

I reverted my edit. --Daniel Carrero (talk) 18:26, 16 May 2016 (UTC)[reply]

Surprising Homographs?

As I was explaining our redirection policy earlier, I used my favorite example of a word that has the same spelling, but is completely unrelated in any way: Indonesian air, which means water. I thought it might be fun to come up with a list of these to use in our documentation. I found a few others to start the list. Can anyone think of more?

air (“water”) (Indonesian)
ball (“organ”) (Irish)
beach (“bee”) (Irish)
bean (“woman”) (Irish)
fear (“man”) (Irish)
here (“testicle”) (Hungarian)
millet (“nation”) (Turkish)
take (take, “bamboo”) (Japanese- Romanization of たけ)
pint (“penis”) (Low German)
teach (“house”) (Irish)

Different capitalization and place names:

Gift (“poison”) (German)
Lizard (“peninsula in Cornwall”) (English)
Mist (“manure”) (German)
Speck (“bacon”) (German)
Split (“city in Croatia”) (Serbo-Croatian, English)
Sexmoan

Feel free to edit/add to my list. Thanks! Chuck Entz (talk) 05:41, 16 May 2016 (UTC)[reply]

ball (“organ”)
bean (“woman”)
teach (“house”)

(All Irish)

Gift (“poison”)
Mist (“manure”)

(Both German) --Catsidhe ^{(verba, facta)} 06:28, 16 May 2016 (UTC)[reply]

I've added those to my list, with the German ones in a separate list for different capitalizations. @Korn: Thanks for pint. There are traces of the same word in English: see cuckoopint. Chuck Entz (talk) 12:46, 16 May 2016 (UTC)[reply]

These are the sort of thing we're looking for at Wiktionary:Foreign Word of the Day/Focus weeks#False friends. It's been a while since we had a focus week for false friends. The last one I remember is the week starting with Wiktionary:Foreign Word of the Day/2014/February#11. —Aɴɢʀ (talk) 07:17, 16 May 2016 (UTC)[reply]

Wiktionary:Foreign Word of the_Day/2013/April#2, Wiktionary:Foreign Word of the_Day/2013/July#15, Wiktionary:Foreign Word of the_Day/2013/October#26, Wiktionary:Foreign Word of the_Day/2014/April#5, Wiktionary:Foreign Word of the_Day/2014/September#2, Wiktionary:Foreign Word of the_Day/2015/April#3, Wiktionary:Foreign Word of the_Day/2015/November#1. — Ungoliant ^(falai) 14:10, 16 May 2016 (UTC)[reply]

Also one of my personal favourites: Mirandese you (“I”) (coupled with Danish I (“you”)). — Ungoliant ^(falai) 14:11, 16 May 2016 (UTC)[reply]

I think Gift and Speck deserve a category of their own, since they are in fact exact cognates with the English homographs, whose meanings diverged quite far. --Wiki Tiki 89 14:37, 16 May 2016 (UTC)[reply]

कट (kaṭ) /kəʈ/ sounds like "cut" and has the same meaning. —Aryamanarora ^{(मुझसे बात करो)} 18:18, 23 May 2016 (UTC)[reply]

OK, my gift (not Gift!) to User:Chuck Entz: a chain reaction of false friends....

meaning	Galician
"hand"	man
"man"	home

This actually made that, when opening the first Zara Home shops in the country of origin of the company, (some) people thought that they would be the men's department.

And you may wonder... how to render English "home" into Galician? casa. Galilove... Sobreira (talk) 09:47, 27 May 2016 (UTC)[reply]

This reminds me, although these are homohophones and not exactly homographs, of this saying in English about Hebrew: [aˈni] is [mi], [mi] is [hu], [hu] is [hi], and [hi] is [ʃi]. --Wiki Tiki 89 14:55, 27 May 2016 (UTC)[reply]

Wiktionary:About Akkadian

I just created Wiktionary:About Akkadian, but I do not actually know very much about our practices for Akkadian. Can someone who works with Akkadian help fill in the information? --Wiki Tiki 89 15:41, 16 May 2016 (UTC)[reply]

@ObsequiousNewt, JohnC5, DerekWinters, Angr: Pinging people who expressed some knowledge of cuneiform in a recent discussion on Hittite lemmas. --Wiki Tiki 89 19:40, 16 May 2016 (UTC)[reply]

I'd love to comment, but the discussion above was never satisfactorily resolved, and most of my concerns with my various proposals were never actually addressed. Since the orthography is similar here, my comments apply also. —ObsequiousNewt (εἴρηκα|πεποίηκα) 15:36, 19 May 2016 (UTC)[reply]

As before, I would like to know what to do with determinatives (are they part of the lemma or not?). I don't believe they should be because they have no phonetic realization. Should we have a vote about cuneiform lemmatization? —John C5 18:39, 19 May 2016 (UTC)[reply]

I believe forms with determinatives should be alternative forms, unless the forms without the determinatives are much less common (OTOH for Ancient Egyptian, I would say determinatives should be part of the lemmas, since they were essentially required for most phonetically spelled words). Whether they have a phonetic realization or not is irrelevant. But anyway, there are a lot more basic things to decide first, such as which dialect to use. I'm biased towards Old Babylonian, because I'm going Huehnergard's grammar, but most of our transliterations seem to be in later dialects without the final -m. Also, it seems that most of our entries are for logograms, whereas logograms were usually less common than than phonetic spellings for most words. --Wiki Tiki 89 18:55, 19 May 2016 (UTC)[reply]

I think this discussion would benefit by looking at the example of the Han characters used in Chinese, Japanese, Korean and Vietnamese: they are, like the cuneiforms, a very long-lived mixed logographic/phonemic system adopted and used by unrelated languages. The Chinese and Japanese editors, especially, have had a great deal of experience working with variations on some of these very issues.

While you won't find purely semantic independent characters to correspond to determinatives, the vast majority of the characters contain some combination of recognizable semantic and phonemic elements. A very transparent example is 他 (originally both he and she), and 她 (tā), which was created by replacing the semantic element meaning "person" with one meaning "woman".

As for Sumerograms and other variations in the ways that cuneiforms can be interpreted: in Japanese there are usually a number of readings for any given character, which are classified into a series of etymologically-based named types: the on readings are borrowed from Chinese, with several subdivisions for the topolect and/or period of Chinese the word was borrowed from, and the kun readings are native Japanese.

Of course, with cuneiforms we don't have a corresponding body of native scholarship nor the knowledge of modern speakers to draw from, so they're a lot messier. Chuck Entz (talk) 03:13, 20 May 2016 (UTC)[reply]

A new "Welcome" dialog

Hello everyone. This is a heads-up about a change which has just been announced in Tech News: Add the "welcome" dialog (with button to switch) to the wikitext editor.

In a nutshell, later this week this will provide a one-time "Welcome" message in the wikitext editor which explains that anyone can edit, and every improvement helps. The user can then start editing in the wikitext editor right away, or switch to the visual editor. (This is the equivalent of an already existing welcome message for visual editor users, which suggests the option to switch to the wikitext editor. If you have already seen this dialog in the visual editor, you will not see the new one in the wikitext editor.)

I want to make sure that, although users will see this dialog only once, they can read it in their language as much as possible. Please read the instructions if you can help with that.

I also want to underline that the dialog does not change in any way the current site-wide configuration of the visual editor. Nothing changes permanently for users who chose to hide the visual editor in their Preferences or for those who don't use it anyway, or for wikis where it's still a Beta Feature, or for wikis where certain groups of users don't get the visual editor tab, etc.
- There is a slight chance that you see a few more questions than usual about the visual editor. Please refer people to the documentation or to the feedback page, and feel free to ping me if you have questions too!
Finally, I want to acknowledge that, while not everyone will see that dialog, many of you will; if you're reading this you are likely not the intended recipients of that one-time dialog, so you may be confused or annoyed by it—and if this is the case, I'm truly sorry about that. This message also avoids that you have to explain the same thing over and over again—just point to this section. Please feel free to cross-post this message at other venues on this wiki if you think it will help avoid that users feel caught by surprise by this change.

If you want to learn more, please see https://phabricator.wikimedia.org/T133800; if you have feedback or think you need to report a bug with the dialog, you can post in that task (or at mediawiki.org if you prefer).

Thanks for your attention and happy editing, Elitre (WMF) 16:47, 16 May 2016 (UTC)[reply]

Would it really have been impossible or hard to switch this off for registered and logged-in users? DCDuring TALK 17:16, 16 May 2016 (UTC)[reply]

The task says so. I'm also here for a reminder—this wiki features a Single Edit Tab system; if you're not sure you know or remember how that works, you can read the guide (which details, among other things, how to switch between editors from the buttons on the toolbar); you can change your editing settings at any time, by the way. (I had also written a very quick intro to the visual editor, in case anyone is interested). Best, --Elitre (WMF) (talk) 14:36, 17 May 2016 (UTC)[reply]

Looking for someone to help with FWOTDs

Hey folks. Due to personal and health issues, I’m unable to spend as much time on Wiktionary as I’d like to. Sooner or later, I won’t have time to keep track of foreign words of the day consistently anymore. For this reason, I really need someone who can share the burden of maintaining the project. While technically anyone who wants to set FWOTDs is free to do so, within the limits of the guidelines, if you want my sanction as an “official” maintainer, you must meet the following criteria:

know how to find your way through linguistic literature: half our featured words are pilfered from various articles published in god-forsaken periodicals and magazines, and you will have to be able to find this stuff if we are to prevent FWOTD from being a rotation of major western European languages;
have common sense: if you think it would be funny to feature penis or something like that, you’re out. FWOTD is serious motherfuckin’ business!
willingness to take the blame: why do you think I’m asking for help anyway? I need someone who can be officially blamed for not noticing my mistakes!

Note that maintaining FWOTD involves lot more than just picking words to feature and updating the templates. Spontaneous nominations are only enough for 10-20% of all featured words, despite my bias towards choosing them; the rest is words that I find or create myself, and this takes a lot of time. I also have to create and upload images for words with unusual scripts, keep an eye out for vandalism on featured words during their day of featuring.

If anyone is interested, reply here and I will send you detailed instructions. — Ungoliant ^(falai) 02:54, 20 May 2016 (UTC)[reply]

Those of you who remember the original FWOTD vote will know that I have slacked off to an embarrassing degree, and I feel very guilty about this. If someone who really wants to work on this and has the requisite skills speaks up, I would be happy to let them do it, but I should take up this burden to make up for leaving so much work on Ungoliant's plate. I do have enough time, and I knew how to run it (although I may have forgotten some of the details by now). —Μετάknowledge^{discuss/deeds} 03:58, 20 May 2016 (UTC)[reply]

I'm happy to help out as time permits, but I definitely don't want to be the primary responsible for FWOTD. —Aɴɢʀ (talk) 14:25, 20 May 2016 (UTC)[reply]

Likewise, could for example help to clean up nominated articles, do research etc. Thanks Ungoliant for your work on this, FWOTD is one of the things I enjoy most here, both as a reader and contributor. – Jberkel (talk) 10:02, 21 May 2016 (UTC)[reply]

Thank you so much guys! I think you’re already familiar with the technical and regulatory aspects, so I’ll just mention the unwritten rules that I try to follow:

hard limit: no more than 2 FWOTDs in the same language per month;
soft limit: no more than 1 FWOTD " " " (I’ve only been able to pull this off a few times);
prefer featuring other people’s nominations over your own;
keep FWOTDs that are in the same language, or in chronological variants (i.e. Spanish and Old Spanish), somewhat far apart;
check the history page of entries; if the information was added by someone who you’re not sure is trustworthy, try to check the references and citations to see if they’re accurate;
no more than one focus week per month (I’ve only had this option once though);
wait for at least a day before featuring words that are posted in the nominations; (sometimes I had to ignore this rule because there were no other options that wouldn’t break the hard limit);
add {{was fwotd}} to the page immediately after featuring the word. You are going to forget it otherwise, have no doubt about it;

Thanks again. — Ungoliant ^(falai) 16:16, 21 May 2016 (UTC)[reply]

Thanks, Ungoliant. I'll start tidying up and setting words. @Angr, Jberkel: Please feel free to add words as you wish, or just nominate more if you prefer. I'm fine with being responsible, as long as you guys help out! —Μετάknowledge^{discuss/deeds} 16:52, 21 May 2016 (UTC)[reply]

Stress positioning in Estonian IPA

I changed something in küll only to notice there's the same thing in tool#Estonian. I'm correct that [t'oːl] implies that there is a syllable break between /t/ and /oː/ and hence this IPA practice is wrong? Korn [kʰũːɘ̃n] (talk) 09:35, 21 May 2016 (UTC)[reply]

There is no syllable break, but the difficulty for the template (and humans) is knowing where the syllables are broken up. —CodeCa t 18:09, 21 May 2016 (UTC)[reply]

The stress sign isn't there by accident, it gets triggered by the actual input, which was "k`üll". So I think someone wanted that. Korn [kʰũːɘ̃n] (talk) 18:11, 21 May 2016 (UTC)[reply]

That's the pronunciation format used by ÕS, specifically to avoid having to be specific about where the syllable breaks are. The backtick ` indicates an overlong syllable, and is placed before the vowel. Why do we need to know syllable breaks to indicate stress in IPA? The vowel is the nucleus of the syllable, that's where stress should be placed. —CodeCa t 18:12, 21 May 2016 (UTC)[reply]

See the documentation of {{et-IPA}} for the notation, btw. It's a simplification of what ÕS uses. —CodeCa t 18:18, 21 May 2016 (UTC)[reply]

We don't need to know it, but in IPA, the character ˈ does not imply a long vowel but primary stress. So what we're currently displaying is not an overlong vowel but a long vowel carrying a primary stress which starts after the initial consonant. So the display ending up with the user doesn't make sense. Korn [kʰũːɘ̃n] (talk) 21:59, 21 May 2016 (UTC)[reply]

The overlength is not displayed in IPA because it's not obvious how. It's not the vowel that gets lengthened, but the syllable coda as well. Even consonant clusters can be lengthened, though I don't know exactly what that entails phonetically. In any case, the feature is suprasegmental, it exists not on the phoneme level but on the syllable level. That said, overlength is always accompanied by stress, so it's always ok to assume that a syllable indicated as overlong is stressed. That's what the template does. —CodeCa t 01:24, 22 May 2016 (UTC)[reply]

Estonian has initial stress anyway, regardless of initial syllable weight, so that still ends up being uninformative.

A simple way to notate overlength would be doubled length marks for long vowels and geminates (e.g. küll [külːː], tool [toːːlʲ]); it's only clusters that are more of a problem. --Tropylium (talk) 19:41, 13 June 2016 (UTC)[reply]

Estonian doesn't always have initial stress, loanwords can have noninitial stress, so it needs to be indicated. —CodeCa t 20:27, 13 June 2016 (UTC)[reply]

OK, rephrasing: Estonian has always initial stress in native vocabulary, so this is not directly related to overlength; so marking overlength as stress does not actually provide any sensible information. --Tropylium (talk) 03:02, 18 June 2016 (UTC)[reply]

New logo 2

I created Wiktionary:Votes/2016-05/New logo 2, to start in a week. It proposes a derivative of the tile logo for the English Wiktionary logo. A rationale is at Wiktionary talk:Votes/2016-05/New logo 2#Rationale. Let us postpone the start of the vote if required by discussion. --Dan Polansky (talk) 08:01, 22 May 2016 (UTC)[reply]

Merge all Prakrits

I think all the Prakrits should be merged into a single language for organizational purposes; do we really need ~5 languages all with the same entry and meaning at 𑀅𑀕𑁆𑀕𑀺 (aggi)? For all intents and purposes, the Prakrits are just dialects. —Aryamanarora ^{(मुझसे बात करो)} 18:51, 23 May 2016 (UTC)[reply]

We'd need more information to decide this. How different are they? Mutual intelligibility? —CodeCa t 19:12, 23 May 2016 (UTC)[reply]

[1] (see bottom of page 8, top of page 9) – they are mutually intelligible, but learning a little Sanskrit greatly helped communication. They were similar enough to be used interchangeably in the same works; see Dramatic Prakrits. Of course, there were minor orthographical differences in inflection, but we can settle on Maharashtri Prakrit as a standard (it's the best documented) and build off of it. —Aryamanarora ^{(मुझसे बात करो)} 19:28, 23 May 2016 (UTC)[reply]

A good analogy is Vulgar Latin, spoken by the common people and thus having many dialects and varying spelling systems. —Aryamanarora ^{(मुझसे बात करो)} 19:31, 23 May 2016 (UTC)[reply]

How do other sources handle it? I'm reminded of the situation with Ancient Greek, where there are sometimes quite striking differences between dialects (Doric -onti vs Attic -ousi(n)). But for Ancient Greek, Attic is mostly the standard form, except in a few cases (τέσσαρες (téssares), which is apparently not the form of any dialect?). —CodeCa t 20:11, 23 May 2016 (UTC)[reply]

Maybe not the form of any older dialect, but it is the Koine form (it's in both LXX and NT). —Aɴɢʀ (talk) 21:11, 23 May 2016 (UTC)[reply]

@CodeCat: Most dictionaries and grammars focus on Maharashtri Prakrit and detail the Dramatic Prakrits second, and often exclude the lesser Prakrits. We can use {{lb}} to differentiate between dialects. —Aryamanarora ^{(मुझसे बात करो)} 23:31, 23 May 2016 (UTC)[reply]

I don't really have an opinion, but I presume that @-sche would probably like to be made aware of this discussion. —Μετάknowledge^{discuss/deeds} 04:06, 24 May 2016 (UTC)[reply]

Thanks for the ping. I'm more knowledgeable of the other kind of Indian language than this kind. I'm intrigued that Wikipedia's article on w:Prakrit says Ardhamagadhi is the definitive Prakrit, but the literature supports Aryamanarora's statement that it is rather "Maharashtri, which [...] with orthodox Jain scholars generally, is Prakrit proper" (Ramananda Chatterjee, 1927, in The Modern Review, volume 41), "Maharashtri [is] considered the Prakrit par excellence" (Thomas R. Trautmann, 2006, Languages and Nations: The Dravidian Proof in Colonial Madras, →ISBN. Does the Wikipedia article need to be corrected?
A. C. Woolner (in his 1986 Introduction to Prakrit) says "it may be understood that the different Prakrits were mutually intelligible among the educated"; G. C. Pande (1990, Foundations of Indian Culture) says "the Prakrits were mutually intelligible". - -sche (discuss) 07:27, 24 May 2016 (UTC)[reply]

@-sche: I think I understand the discrepancy now; Maharashtri is the main Prakrit of Jainism, Ardhamagadhi is for Hinduism, and Pali is for Buddhism. (Yes, Pali is a Prakrit, but is considered a separate language for sectarian reasons). —Aryamanarora ^{(मुझसे बात करो)} 13:46, 24 May 2016 (UTC)[reply]

@CodeCat, -sche, Metaknowledge So is this a yes? —Aryamanarora ^{(मुझसे बात करो)} 00:14, 25 May 2016 (UTC)[reply]

Yes, merge the ones which have "Prakrit" in their names once we decide on a code. Should Pali also be merged, in your view? Authorities have traditionally treated Pali differently from the Prakrits, but for non-linguistic reasons, as you note. - -sche (discuss) 02:49, 25 May 2016 (UTC)[reply]

We should leave Pali separate; there are too many entries and Pali has some of its own developments that set it apart from the rest of the Prakrits (multiple scripts, strong East Asian Buddhist influences, etc). —Aryamanarora ^{(मुझसे बात करो)} 16:37, 25 May 2016 (UTC)[reply]

Also, we could use pra as a language code; it is the collective code in the ISO standard for all Prakrits. —Aryamanarora ^{(मुझसे बात करो)} 13:50, 24 May 2016 (UTC)[reply]

I'll point out that currently the Prakrit languages are acting as the ancestor languages for several different branches of the Indo-Aryan family (seen here if you scroll way down). We've had some issues in the past of people trying to say words are inherited from Sanskrit when Sanskrit has no direct descendants. If we do merge them, we definitely should have etymology only languages for them. —John C5 14:48, 24 May 2016 (UTC)[reply]

@JohnC5: Um, (Vedic) Sanskrit is the direct ancestor of all the Indo-Aryan languages; Classical Sanskrit seems to be what you're talking about. Anyway, we definitely need the current codes to remain intact, as many entries reference certain Prakrits (CAT:Hindi terms derived from Sauraseni Prakrit). —Aryamanarora ^{(मुझसे बात करो)} 00:14, 25 May 2016 (UTC)[reply]

Sorry, yes, Vedic is apparently what I meant. —John C5 00:29, 25 May 2016 (UTC)[reply]

Is pra a family code? If so, we shouldn't reuse it as a language. —CodeCa t 00:43, 25 May 2016 (UTC)[reply]

pra is both an ISO-639-5 family code and an ISO-639-2 language code. If we merge the Prakrits, do we still need it as a family code? If not, we could use it as a language code, like nah. Otherwise, how about "inc-pra"? - -sche (discuss) 02:49, 25 May 2016 (UTC)[reply]

Both of them would work for me, but pra is shorter, and a family code wouldn't be needed if we merged all of the Prakrits. —Aryamanarora ^{(मुझसे बात करो)} 16:37, 25 May 2016 (UTC)[reply]

We already have Proto-Indo-Aryan inc-pro for general ancestor needs (and which is marginally distinguishable from Vedic in a few features); I am not sure how much benefit there is in maintaining further ancestor stages? Ardhamagadi as the ancestor of Easter IA (Assemese et al.) and Maharastri as the ancestor of Southern IA (Marathi et al.) is probably at least defensible, but my understading is that there's not a whole lot of consensus on the genetic classification of the New A varieties, including also the exact definition of the Eastern and Southern groups. --Tropylium (talk) 08:23, 27 May 2016 (UTC)[reply]

@Aryamanarora I would like to weigh in on the matter and say that the Prakrits should not be merged all together. They are independent languages with different grammars, even though they are very similar. The old Sanskrit plays that often incorporated all the Prakrits did so because they knew that their audience was of the class that would have knowledge of the various languages and their differences. There is a reason that these prakrits have been named separately and given individual grammatical treatises by the various Indian grammarians. And, to argue against merging them over mutual intelligibility, Scots is kept as a separate language despite extremely high levels of intelligibility with English. DerekWinters (talk) 21:06, 25 May 2016 (UTC)[reply]

@DerekWinters Their grammars are not that different; they have the same cases, numbers, genders, and inflections. The only differences are spelling, e.g. third-person singular indicative in verbs is marked by -aï in Maharashtri but with -adhi in Sauraseni. They are more similar to each other than the Ancient Greek dialects. ~~There were no "old Sanskrit plays"; the plays were all Prakrit (see Dramatic Prakrits), but~~ certain characters spoke different dialects. Finally, it would make entries so much easier if we merged all of them; do we really need 5-6 entries with the same meaning at "aggi" and "hattha"? —Aryamanarora ^{(मुझसे बात करो)} 22:41, 25 May 2016 (UTC)[reply]

Also, Prakrit was a vernacular; the people who spoke Sanskrit (Brahmins) simply ignored it as a lower-class language; they would have little knowledge of it. —Aryamanarora ^{(मुझसे बात करो)} 22:43, 25 May 2016 (UTC)[reply]

@Aryamanarora Sorry I meant the old Indian plays (but also, do look at Sanskrit drama, I believe the Mṛcchakatika is quite famous). And also one could say the same about the cases and numbers and all regarding Avadhi, Braj Bhasha, Kannauji, Hindi, etc. yet they are certainly separate languages. And again, we maintain Scots as separate regardless of its similarities to English. And it's not really a valid reason to say it would make the editor's job easier, because nothing is required of the editor. If you wish, all you need add are the Maharashtri prakrit ones, and someone else someday will add the others. But I do maintain that they are indeed separate languages. DerekWinters (talk) 00:51, 26 May 2016 (UTC)[reply]

Also I do believe that Magadhi was quite divergent from the other two Dramatic ones. DerekWinters (talk) 00:52, 26 May 2016 (UTC)[reply]

Also, regarding the Brahmins and the prakrits being a vernacular, they were thus spoken by the people, which would include a lot, if not all of the Brahmins. Classical Sanskrit was a very artificial register and during the prakrit era was most certainly only taught as a second language. Also, there are numerous grammars on the prakrits by native grammarians, so they certainly were not ignored. DerekWinters (talk) 00:57, 26 May 2016 (UTC)[reply]

@DerekWinters All your points are very good, and I realize some of my claims were false. However, I still think we should merge them. This is an analog of situation of the Ancient Greek dialects, where many dialects diverge from the traditional form (Attic Greek) but ultimately we classify them as one language. Our entries are very well organized as a result; see τέσσαρες (téssares), which is what I think a good unified Prakrit entry would look like. Yes, Magadhi diverges quite a bit, and Gandhari uses a different script, and Elu somehow made it to Sri Lanka. However, they all have such similar characteristics that the would be decently comprehensible among monolingual Prakrit speakers. See the text example at Magadhi Prakrit#Pali and Ardhamāgadhī; even though Pali is a wholly earlier and more divergent stage from Prakrit, the two texts are nicely comparable.

Also, the inflection is more than just similar; it is often the same: See this grammar comparing Sauraseni and Maharashtri declensions of "putta" (son, < Sanskrit पुत्र (putra)). —Aryamanarora ^{(मुझसे बात करो)} 01:56, 26 May 2016 (UTC)[reply]

@Aryamanarora I see where you are coming from with the inflections, but I believe this may be something like the unification of Chinese. Written, they seem similar (although I would argue that the grammars are much more divergent for the Chinese languages), but spoken, a monolingual speaker of one prakrit would have difficulties understanding the speech of a monolingual of another prakrit. I personally believe this is grounds enough to keep them separate, but I understand if the community doesn't agree. But I must caution, if we are to have entries for unified prakrit, we should have inflection tables for all the varieties attested, and we are likely to have citations for the various varieties. Furthermore, with the phonetic differences among certain words, I believe this would lead to very cluttered and messy entries. I believe that all of this information could be better handled in individual entries. DerekWinters (talk) 02:51, 26 May 2016 (UTC)[reply]

@DerekWinters While there would be some difficulty in comprehension, I doubt a monolingual Prakrit speaker wouldn't be able to at least understand the gist of another Prakrit. Literature agrees with me; see -sche's references above. We should definitely make inflection tables for all the Prakrits; we have enough information to do so. The phonetic differences aren't too bad. Mainly, there's a little bit of consonant dropping and sibilant mergers between Prakrits, but IMHO it isn't so bad. I can make some inflection tables right now if needed. —Aryamanarora ^{(मुझसे बात करो)} 22:26, 27 May 2016 (UTC)[reply]

@Aryamanarora I agree that there are similarities, but we also maintain such differences in several languages here, like Portuguese, Galician, and Fala; Spanish, Asturian, Leonese, and Extremaduran; Persian and Tajik; German and Yiddish, etc. You could definitely argue that they are individual languages, but one could also argue that they are simply dialects of one larger language. And you are correct, a monolingual prakrit speaker would probably understand somewhat another prakrit, especially in the educated, but I do not think that is a fair metric, as the educated would have learned Sanskrit, enabling them significant comprehension of any of its immediate daughter language. But, regardless, we have no way of truly knowing, and as such I think we should maintain the separation that has been held by the writers of the prakrits. They viewed them as separate, and I believe for decent enough reason. DerekWinters (talk) 02:21, 28 May 2016 (UTC)[reply]

@DerekWinters Primary sources aren't always reliable for language distinction; look at modern day Serbo-Croatian, Romanian-Moldovian, Hindi-Urdu, etc. You're right though, we really have no way of knowing. I'll stick with the status quo for Prakrit, and continue to treat them as seperate languages. —Aryamanarora ^{(मुझसे बात करो)} 15:22, 28 May 2016 (UTC)[reply]

Template:nonlemma

Since the proposal at Wiktionary:Beer parlour/2016/March#Etymology section for non-lemmas was inconclusive, I've instead created this template to place in non-lemma etymology sections. The displayed text may need improvement, feel free to propose or make changes. —CodeCa t 20:23, 23 May 2016 (UTC)[reply]

Perhaps it could say something less jargony like "See etymology on main entry/entries." rather than just "Non-lemma forms." (which wouldn't mean much to most readers) Pengo (talk) 10:45, 24 May 2016 (UTC)[reply]

I wholeheartedly concur with Pengo. I like their phrasing as well. —Μετάknowledge^{discuss/deeds} 07:45, 26 May 2016 (UTC)[reply]

I think you could say "See etymology on main entry." as a user has only one in mind. Why wouldn't there be a link to the appropriate L2 section or even the appropriate Etymology section? Presumably there is a language parameter in the template. DCDuring TALK 10:53, 26 May 2016 (UTC)[reply]

Rename Category:Fictional abilities to Category:Metaphysical abilities

The title says one half of the proposal. The other is it to move it to Category:Parapsychology. --Lo Ximiendo (talk) 21:22, 26 May 2016 (UTC)[reply]

Parapsychology is a real (pseudo)science that investigates actual events; that term would not be applied to deliberately fictional superpowers like those in comic books. Equinox ◑ 21:55, 26 May 2016 (UTC)[reply]

By the way, "metaphysical" means "beyond physical". And performing a metaphysical ability, such as telepathy, IS a paranormal activity. --Lo Ximiendo (talk) 21:59, 26 May 2016 (UTC)[reply]

One of the best places to hide things you don't want people to take seriously is in fiction. --Lo Ximiendo (talk) 22:00, 26 May 2016 (UTC)[reply]

@Equinox: Posted a belated reply. --Lo Ximiendo (talk) 10:26, 27 May 2016 (UTC)[reply]

Initialisms of proper nouns that wouldn't meet CFI

What is our criteria for including these? Should they be in lemma categories? DTLHS (talk) 23:49, 26 May 2016 (UTC)[reply]

Initialisms are lemmas regardless. They are full noun lemmas after all, and can have their own inflections. —CodeCa t 00:16, 27 May 2016 (UTC)[reply]

They are not SoP; someone unable or too impatient to work it out from context might want to know what they mean. Whether they are truly useful is more questionable, but by that criterion many entries would be in trouble. DCDuring TALK 01:03, 27 May 2016 (UTC)[reply]

cuprum from Cyprium or from Κύπρος

Shouldn't we have entries for expressions like aes Cyprium? Cyprus does come from Κύπρος, but cuprum does not directly, it actually comes from aes Cyprium or at least Cyprium. I stated cuprum as derivative in Cyprium. Sobreira (talk) 08:50, 27 May 2016 (UTC)[reply]

I see nothing wrong with having an entry for aes Cyprium. It's not SOP, as "Cyprian brass" does not obviously mean "copper". —Aɴɢʀ (talk) 09:14, 27 May 2016 (UTC)[reply]

I think, given the difference in the vowel, that cūprum is an older borrowing. —CodeCa t 12:36, 27 May 2016 (UTC)[reply]

User-friendly reference sheet

I'm on for some years now and every now and then I encounter a problem and think: How will anyone solve that without having to ask? Now, asking or looking up the help archives is not bad - for us - but we're on the internet and people are not necessarily super willing to invest time to figure out things that seem obscure to them. Wiktionary should be intuitive and easy to start with and not confusing, because at least I know some people who will have fledging interests instantly destroyed if they think oh, that looks too complicated/confusing for me.
What I wish for since years is a well-visible link on the front page to a how to containing the most basic information for entry editing:

Link to the list of language codes
a (curtly!) commented table of every section of an entry, it will probably suffice to just list the non-optional ones and then link to Wiktionary:Entry_layout#Additional_headings
quick explanations of how templates work and how to make them
explanation of wiki-formatting and how to create tables
short overview of namespaces and what you can expect to find there
short information that "Category: Language" and "Wiktionary: About Language"-pages exist and one or two sentences what they contain
Mentioning of Wiktionary:Discussion_rooms
An overview of the utmost important templates - noted by Jberkel
and whatever else absolute basics you can think of

Of course we have most of this information (for template workings, you have to look up MediaWiki, on the other hand), and my proposal here is not meant to imply that all of this information is presented to the user badly. But it's scattered across respective section, it's sometimes stuffed with detailed explanations that are good to have if you want to get a thorough understanding, but are hindering the overview of the actual how to do-part. And claim nobody they are absolutely necessary for a new user, because, let's face it, a very good deal of new users probably just hits edit and does copy-paste. These detailed explanations should be of course reachable from the reference sheet I'm proposing, but what I want for Wiktionary is that a user can do a single click, take 5 seconds to look and think: Oh, I could do that. ~~I'm willing to make a draft of what I have in mind if you don't heavily oppose my idea.~~ (Draft below.) I'm all ears for opinions and arguments. Korn [kʰũːɘ̃n] (talk) 09:26, 31 May 2016 (UTC)[reply]

Great idea, this comes up over and over again, and because of this we're definitely losing contributors. A first step would be to identify the key templates and concepts new users need to be familiar with. I also have some ideas how we could make the documentation more accessible and user-friendly, I hope I get to work on some of them at Wikimania next month (anyone else going to be around by chance?) – Jberkel (talk) 11:10, 31 May 2016 (UTC)[reply]

Should Help:How to edit a page not cover this? In any case it would be a good starting point for any new guide such as you propose (and which I do support). — Kleio (t · c) 18:41, 31 May 2016 (UTC)[reply]

I think the goal is something shorter, not requiring much reading. If it is feasible, it would be useful. We should make sure that any content added in accord with the reference sheet, does not get excessively rough handling from patrollers. Perhaps users should get a free pass if they insert a template saying they were trying to operate in accordance with the reference sheet. Their entries could be reviewed with an eye to tutoring them.

Most good contributions now are additions to or corrections of existing entries: translation, additional definitions, related terms etc. IMO these should be a focus of the reference sheet. DCDuring TALK 19:39, 31 May 2016 (UTC)[reply]

Yes, I agree. The current help page does not truly give a clear and concise tutorial to adding information properly to an article, it reads like rather dry documentation. That said, while additions to existing lemmata for some languages (especially English) constitute a majority of constructive edits, not all languages are quite as well represented on Wiktionary. For example, dead languages like Old English, Old Norse and Gothic are three languages I know of that would usually benefit more from additional lemmata than additions to existing ones. This is also true for most minority languages, which similarly often lack basic vocabulary. That may be something to keep in mind. — Kleio (t · c) 22:30, 31 May 2016 (UTC)[reply]

I expect that the dead-language folks would tend to be more willing to put up with and more capable of handling some complexity. In any event there is some language-specific knowledge required as well.

But because this is English wiki we tend to get (and need to get) as contributors native English speakers, who are often not very sophisticated in their understanding of language matters and are often monolingual. I think they are important targets for at least one reference sheet. Folks for whom English is a second language seem to typically have at least a bit more language sophistication, but they also have different needs, eg, the need to write definitions (or select glosses) in English, not exclusively relying on old bilingual dictionaries that use words not really suitable for a defining vocabulary for a current dictionary. DCDuring TALK 00:21, 1 June 2016 (UTC)[reply]

Unfortunately, languages like Old Norse, Old English and Gothic tend to attract people with agendas, or for whom the languages are just an intellectual game. For instance, we have a French IP user who's very well-versed in a number of difficult languages, but they don't like to be limited by things like attestation and historical authenticity. We have to watch constantly to keep them from adding Gothic and Old Church Slavic translations for things like television and Esperanto.

Then there are the people who find obsolete references like this and [this and this, so they feel qualified to create entries and add etymologies. Making things easier for such people to start editing may not be a good idea... Chuck Entz (talk) 03:28, 1 June 2016 (UTC)[reply]

People with an agenda and the will to dig for obscure references are people with enough time and passion to get into this anyway. We need to make things look simple and non-threatening for what I always call passer-bys. Korn [kʰũːɘ̃n] (talk) 11:18, 1 June 2016 (UTC)[reply]

I completely agree with Korn (talk • contribs), Even for the experienced user it's oftentimes tedious to find the documentation and keep up with the everchanging template zoo. Matthias Buchmeier (talk) 20:39, 31 May 2016 (UTC)[reply]

User:Korn/draft This is the rough sketch of what I was thinking of. Actually, "how to make a core entry" already has too much text for my taste, but I don't think one can subtract any information without creating a lack. Korn [kʰũːɘ̃n] (talk) 10:00, 1 June 2016 (UTC)[reply]

A note about a future change

Discussion moved to Wiktionary:Beer parlour/2017/May.

Wiktionary:Beer parlour/2016/May

Contents

Snowclones

Definitions

Appendix:Repetition

Tagging entries missing a headword template

vCat

Gothic words that are attested only in Runic inscriptions

French capital letters with diacritics

Creating standards for GML

Standard layout of adjective tables?

Appendix:Possessive

Surprising Homographs?

Wiktionary:About Akkadian

A new "Welcome" dialog

Looking for someone to help with FWOTDs

Stress positioning in Estonian IPA

New logo 2

Merge all Prakrits

Template:nonlemma

Rename Category:Fictional abilities to Category:Metaphysical abilities

Initialisms of proper nouns that wouldn't meet CFI

cuprum from Cyprium or from Κύπρος

User-friendly reference sheet

A note about a future change

Navigation menu

Wiktionary:Beer parlour/2016/May

Snowclones

Definitions

Appendix:Repetition

Tagging entries missing a headword template

vCat

Gothic words that are attested only in Runic inscriptions

French capital letters with diacritics

Creating standards for GML

Standard layout of adjective tables?

Appendix:Possessive

Surprising Homographs?

Wiktionary:About Akkadian

A new "Welcome" dialog

Looking for someone to help with FWOTDs

Stress positioning in Estonian IPA

New logo 2

Merge all Prakrits

Template:nonlemma

Rename Category:Fictional abilities to Category:Metaphysical abilities

Initialisms of proper nouns that wouldn't meet CFI

cuprum from Cyprium or from Κύπρος

User-friendly reference sheet

A note about a future change

Navigation menu

Search