Wiktionary:Beer parlour/2013/May

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← April 2013 · May 2013 · June 2013 →

Homophones[edit]

homophone provides the following definition: A word which is pronounced the same as another word but differs in spelling or meaning or origin, for example: carat, caret, carrot, and karat. It's important to note the use of as another word in this definition. This means that many pages provide wrong information, e.g. familiarisât mentions familiarisa and familiarisas as homophones, while they are different forms of the same word. It's possible to state that the forms are homophonous, but not to mention them in a section named Homophones. In French, when you list the homophones of sot, you mention saut, seau, sceau, Sceaux, but never sots, because it's the same word. Lmaltier (talk) 06:09, 1 May 2013 (UTC)

If familiarisât, familiarisa and familiarisas are the same word, are their definitions incorrect? — Ungoliant (Falai) 06:12, 1 May 2013 (UTC)
No, definitions are correct. It's the same as provide and provides, or cat and cats: inflected forms, different forms of the same word. Lmaltier (talk) 07:20, 1 May 2013 (UTC)
cat and cats are different words; one is a conjugated form of the other, but they're still different words. Certainly that depends on the arbitrary definition of word, but I don't think my definition is idiosyncratic or unusual.--Prosfilaes (talk) 08:09, 1 May 2013 (UTC)
So I oppose removing the homophones. They’re different words which mean different things, so if they have the same pronunciation they should list each other as homophones, whether they have the same lemma or not. — Ungoliant (Falai) 19:49, 1 May 2013 (UTC)
See references I provide below. We should not misinterpret the sense of homophone, nor the sense of word in its definition, we should use the word the same way as other linguists. Lmaltier (talk) 20:02, 1 May 2013 (UTC)
What's included as a ==French== homophone can be decided by the French editors without community input, much as language editors decide on transliteration, hyphenation, and so on. As for English, if (hypothetically, since I can't think of a real example) the present-tense tread and the past trod were pronounced the same in some common accent, I'd consider them homophones in that accent and IMO we should list them as such in the entry.​—msh210 (talk) 07:29, 1 May 2013 (UTC)
See references I provide below. We should not misinterpret the sense of homophone, nor the sense of word in its definition, we should use the word the same way as other linguists. Lmaltier (talk) 20:01, 1 May 2013 (UTC)
I don't see why we wouldn't class these are homophones; what's the reason not to? Mglovesfun (talk) 10:37, 1 May 2013 (UTC)
I agree. No reason to say French sots isn't a homophone of sot. —Angr 11:26, 1 May 2013 (UTC)
Yeah. Lmaltier, you just arbitrarily pick out one of the many definitions of "word" (= lexeme) and say that the current practice of adding homophones isn't in line with it. But there's no reason to pick out that particular definition in the first place. Longtrend (talk) 11:35, 1 May 2013 (UTC)
From the PoV of someone not too familiar with French, it is at least minimally useful to know that there are many forms of a given lemma that are pronounced the same. DCDuring TALK 12:01, 1 May 2013 (UTC)
For a learner of French it would actually be more useful to have a list of plural nouns that are not homophones of the singular. There aren't many; œufs and yeux spring to mind. —Angr 13:15, 1 May 2013 (UTC)
You are right. No, I don't arbitrarily pick out one definition of word. I just want to use the noun homophone in the sense actually used by almost everybody referring to homophones. I understand that this sense is probably less clear in English, because inflected forms are not pronounced the same, but in languages such as French, different inflected forms are very often pronounced the same. Look at books or websites addressing homophones (such as http://tempsreel.nouvelobs.com/abc-lettres/saut-sceau-seau-sot/S/homophone.html). You'll see that they exclude inflected forms (because they consider they are the same word). I feel that Wiktionary homophone lists often misinterpret the sense of the word homophone, this was why I mentioned the issue. If you disagree, just try to find a book mentioning sots as a homophone of sot (or the same kind of case). Just a few references: http://books.google.fr/books?id=lTb3xSgNPecC&pg=PA3&dq=homophones&hl=fr&sa=X&ei=xGqBUcvtAYGK7Aa67YEw&ved=0CDUQ6AEwAA#v=onepage&q=homophones&f=false states words different in origin and signification. http://books.google.fr/books?id=Z7ZyXZAJgx8C&pg=PP7&dq=homophones&hl=fr&sa=X&ei=xGqBUcvtAYGK7Aa67YEw&ved=0CDoQ6AEwAQ states words which sound the same but have totally different meanings. http://books.google.fr/books?id=0crig9rvzpMC&pg=PA8&dq=homophones&hl=fr&sa=X&ei=xGqBUcvtAYGK7Aa67YEw&ved=0CFwQ6AEwCA#v=onepage&q=homophones&f=false states that homophones have a different meaning and a different spelling. I disagree with this last definition: bear as a noun and bear as a verb are true homophones. Anyway, I think that all these books seem to agree on the fact that homophones have unrelated meanings, which excludes inflected forms. Lmaltier (talk) 19:43, 1 May 2013 (UTC)
Do those books ever list nonlemma forms? Do they list œufs as a homophone of eux? Do they list aime/aimes/aiment as homophones of M? (In English, bear and bear are usually called homonyms rather than homophones.) —Angr 20:06, 1 May 2013 (UTC)
An example: http://people.mpim-bonn.mpg.de/zagier/files/exp-math-2/fulltext.pdf mentions œufs as a homophone of eux, and ôte as a homophone of haute, but never mentions inflected forms of the same word as homophones. Lmaltier (talk) 20:15, 1 May 2013 (UTC) No, this is not a good example: this mathematical paper mentions parle and parlent (but its objective is the demonstration of a theorem...). A better example is the linguistic site I mentioned above, which mentions œufs as a homophone of eux: http://tempsreel.nouvelobs.com/abc-lettres/eux-oeufs/E/homophone.html. They provide many homophones, classified by their first letter. Try to find cases such as sot/sots... Lmaltier (talk) 20:30, 1 May 2013 (UTC)
Neither of those sources makes an attempt to be exhaustive, and they're not going to list forms that native speakers (who are their target audience) will find obvious and trivial—every French speakers knows (if only subconsciously) that virtually every plural noun is homophonous with its singular. But since we're an English-language dictionary, our target audience is English speakers, not French speakers, and our readers can't be expected to just know which inflected forms of words are going to be homophonous and which aren't. (Are aimait and aimer homophones? Without looking it up, I as a French learner honestly do not know.) —Angr 21:30, 1 May 2013 (UTC)
“words different in origin and signification”: familiarisât, familiarisa and familiarisas have different signification and, judging from the different endings, were derived with (or descend from words with) different suffixes.
“words which sound the same but have totally different meanings”: familiarisât, familiarisa and familiarisas sound the same and have different meanings.
Ungoliant (Falai) 20:33, 1 May 2013 (UTC)
Are you serious? If you don't want to understand references as I do (and I find my interpretation really obvious), I cannot add anything except that: just try to find a linguistic book or a dictionary including inflected forms of the same word in their examples of homophones. Lmaltier (talk) 21:35, 1 May 2013 (UTC)
Yes. Evanildo Bechara, Moderna Gramática Portuguesa:
“Pode haver homofonia em um mesmo paradigma (“sincretismo”), como em cantava, 1.ª e 3.ª pess. do imperfeito, []
There can be homophony in the same paradigm (“syncretism”), as in cantava, 1st and 3rd person of the imperfect, []
He is claiming that cantava (1st person singular imperfect indicative of cantar) is homophonous with cantava (3rd person singular imperfect indicative of cantar). — Ungoliant (Falai) 22:00, 1 May 2013 (UTC)
http://legacy.earlham.edu/~peters/writing/homofone.htm, Suber and Thorpe, "An English Homophone Dictionary", offers us axis and its plural axes. It's entirely natural to exclude them in French; it would be odd to exclude them in English but it's an extremely rare case, and I'm not familiar with any other language whose spelling is so confused as for homophones to be a major issue. (Well, Chinese, but that's a whole nother ballgame.)--Prosfilaes (talk) 07:39, 2 May 2013 (UTC)
Your 1st example refers to the noun homofonia, not to the noun homophone (and even in English, I find the translation quite normal; my issue is not about homophony). Your 2nd example is more interesting: this case is so rare in English that it's understandable that it's interesting to mention that axis and axes are pronounced the same. But it seems clear to me that it's outside the scope of many definitions of the word homophone. Nonetheless, Webster's definition also seems to include this case, as a difference in spelling is one of possible conditions, according to this definition. This discussion seems to show that the general idea is rather clear, but that each author interprets the precise sense differently when trying to define it precisely. Lmaltier (talk) 19:58, 2 May 2013 (UTC)
Homofonia means the quality of words being homophones. — Ungoliant (Falai) 20:09, 2 May 2013 (UTC)
It's not an English word. My only concern was about the noun homophone. Personally, I have no concern with homophony, nor with adjectives (homophonous, or the adjective homophone in French...), as their use seems to be wider. Lmaltier (talk) 18:33, 3 May 2013 (UTC)
Actually, axis and axes are pronounced completely different in English, at least in the General American accent. In axis the 's' has a soft 'ss' sound while axes has a harder 'zz' and the 'e' is slightly longer. In short: "axis" = ack-sis while "axes" (plural of axe) = ack-siz and "axes" (plural of axis) = ack-seez. On-topic, I agree that plurals should be considered different words to clarify homophones to non-french speakers. --Soardra (talk) 20:13, 5 May 2013 (UTC)
I agree with Prosfilaes and Ungoliant, familiarisât, familiarisa and familiarisas are homophones. - -sche (discuss) 21:06, 1 May 2013 (UTC)

Context labels[edit]

Hi! I am adding context labels to my Wiktionary parser. There are 1001 context labels in English Wiktionary which should be added manually to parser by me and my colleagues :) I have several questions:

-- Andrew Krizhanovsky (talk) 08:15, 1 May 2013 (UTC)

  1. They look regional to me.
  2. {{item}} is not used in principal namespace. Are you parsing outside principal namespace?
  3. I personally favor having only the more specific category for any large category of "context labels".
HTH. DCDuring TALK 12:09, 1 May 2013 (UTC)

Helping parsers and scrapers might be a good reason to explicitly use {{context|something}} or {{label|something}} instead of having an open set of labels {{something}}. Would this be helpful for the parser project? Michael Z. 2013-05-01 15:47 z

Thank you!
Yes, parser will use {{context|something}} but an open set of labels {{something}} will be parsed also.
Templates Karabakh, Kromanti and Tigranakert moved to "Category:Regional context labels".
Dear DCDuring, I didn't catch what is "outside principal namespace"? -- Andrew Krizhanovsky (talk) 18:17, 1 May 2013 (UTC)

Don’t thank me yet. This is a proposal on the table, but we haven’t moved forward yet.
DCD wonders if you will parse Appendix:, Wiktionary:, Talk:, or pages in other namespaces. Michael Z. 2013-05-01 18:45 z
OK, now I am parsing only main namespace. -- Andrew Krizhanovsky (talk) 21:12, 1 May 2013 (UTC)

Someone broke something here. Why is a category "Regional context labels Armenian" appearing in e.g. ճղոպուր (čłopur)? --Vahag (talk) 12:18, 3 May 2013 (UTC)

It's a wiki-magic :( I don't understand how it is happen. -- Andrew Krizhanovsky (talk) 08:00, 4 May 2013 (UTC)
Fixed, I think. - -sche (discuss) 15:19, 4 May 2013 (UTC)
Thank you. --Vahag (talk) 15:33, 4 May 2013 (UTC)

Dialects (Context labels)[edit]

There are four templates: {{dialect}}, {{dialectal}}, {{dialectal-n}} (not used now!), {{dialects}}. Is there any difference between these templates? -- Andrew Krizhanovsky (talk) 04:30, 2 May 2013 (UTC)

{{dialect}}, {{dialectal}}, and {{dialects}} all categorise an entry into Category:Language name dialectal terms, but they display different text in contextual descriptions, according to the name of the template. I don't know about {{dialectal-n}}. I'm so meta even this acronym (talk) 12:20, 2 May 2013 (UTC)
OK. Thank you. -- Andrew Krizhanovsky (talk) 13:19, 2 May 2013 (UTC)
You're welcome. :-) I'm so meta even this acronym (talk) 21:47, 2 May 2013 (UTC)
{{context-n}} looks like 'context new' to me. It seems to do the same job but without using brackets. Am gonna rfd it. Mglovesfun (talk) 10:45, 3 May 2013 (UTC)

Yorubic (Religion context labels)[edit]

There are no entries with religion context label {{Yorubic}}. Should it be kept or deleted? -- Andrew Krizhanovsky (talk) 16:15, 8 May 2013 (UTC)

Nominated for deletion. Mglovesfun (talk) 15:05, 16 May 2013 (UTC)

board sports vs. skateboarding[edit]

Must we merge two Sport context labels templates: {{board sports}} and {{skateboarding}}? There are only 3 entries with template "board sports" (see list). -- Andrew Krizhanovsky (talk) 07:48, 14 May 2013 (UTC)

I think so, unless board sports is meant to include surfing and snowboarding. Michael Z. 2013-05-16 14:46 z
OK, I see. -- Andrew Krizhanovsky (talk) 10:44, 17 May 2013 (UTC)

video games vs. video game genre[edit]

Must we merge: {{video games}} and {{video game genre}}? There are only 4 entries with template "video game genre". -- Andrew Krizhanovsky (talk) 08:20, 14 May 2013 (UTC)

Yes. Michael Z. 2013-05-16 14:47 z

Template:mathematics[edit]

Must we move {{mathematics}} from Category:"Topical context labels" to Category:"Mathematics context labels"? -- Andrew Krizhanovsky (talk) 07:56, 14 May 2013 (UTC)

We have never achieved consensus on how to consistently treat such labels. Originally, we only had "context labels" that indicated limited usage contexts. Then we introduced large-scale use of topical context labels without differentiating cleanly the four cases:
  1. a term-definition is widely used and understood, but clearly has a topic associated, eg, sum has a definition that belongs to the "topic" 'arithmetic'.
  2. a term-definition has a topic associated, but is only understood and used in a narrow, usually technical context, eg, affine transformation.
  3. a term is used by a technical community, but the subject matter is not limited to that community. Examples might be military slang for a civilian.
  4. widespread use, no specific topic. We act as if most definitions are of type 4, without having defined what "no specific topic" might mean. Most function words would be of type 4. Presumably also most basic verbs.
Topical context labels should apply to types 1 and 2. Usage context labels should apply to types 2 and 3. Ruakh has defended the use of topical context labels, even for words that are widely understood, presumably including sum (type 1). MZajac has advocated more or less banning the use of topical context labels where the topic did not also provide a usage context (ie, type 1). I don't think there is a consensus. One can find dictionaries that seem to follow either. But I have yet to find a print or online dictionary that seems to impose topical labels on all the term-definitions that could "logically" be assigned to a topic.
If you look at the topical category Category:en:Arithmetic, you will find several terms that are widely understood and used, but not sum which clearly has a definition that belongs in that topic.
IOW, this is a can of worms. But it can be swept under the carpet again. DCDuring TALK 11:57, 16 May 2013 (UTC)
I’m not sure I am clear on the four use cases. Is no. 4 the case where no label is normally applied?
I think I would add one other, perhaps overlapping with any of 1 to 4: 5. a term whose technical meaning is prescribed by some authority and accepted in its field of usage, even though it may not be easily attested by citations. Examples may include technical or legal definitions. Michael Z. 2013-05-16 17:12 z
"Is no. 4 the case where no label is normally applied?" Yes, exactly. Does that make the rest of it clearer?
"meaning is prescribed by some authority and accepted in its field of usage" That's an interesting situation - not uncommon - that adds an additional dimension, ie, another four types. DCDuring TALK 17:20, 16 May 2013 (UTC)
DCDuring, we're talking about the categorization of the template, not how it is used. Context labels can be divided into subcategories using |tcat=foo. There's really no consensus over whether to do this or not, and pretty much nobody cares. I started doing it a few years ago and realized there are many more important things I could do. Mglovesfun (talk) 10:48, 17 May 2013 (UTC)
Silly me. I thought there might be a connection between the templates' categorization and how they were used. And that such might be of interest or concern to someone who was rigorously working through our contexts labels. DCDuring TALK 11:04, 17 May 2013 (UTC)
I care. Michael Z. 2013-05-17 15:22 z

Template:game theory[edit]

Must we move {{game theory}} from Category:"Games context labels" to Category:"Mathematics context labels"?
See w:Game theory. -- Andrew Krizhanovsky (talk) 08:27, 14 May 2013 (UTC)

Yes.​—msh210 (talk) 20:57, 24 June 2013 (UTC)
OK, Yes check.svg Done :) - -sche (discuss) 22:13, 24 June 2013 (UTC)

Template:element symbol[edit]

This template prints the text "(chemistry)". It would be reasonable to move it from Category:Context labels to Category:Science context labels. -- Andrew Krizhanovsky (talk) 10:48, 16 May 2013 (UTC)

Agreed. Yes check.svg DoneΜετάknowledgediscuss/deeds 02:41, 17 May 2013 (UTC)
But is the labelling correct? I suspect that many or all of these symbols are also used in physics, geology, manufacturing, medicine, perhaps metalsmithing, electronics, etc.
There will always be a closed set of about 118 symbols for chemical elements. If their use is restricted to the field of chemistry, then just tag them with {{chemistry}}. There’s no need for a special-use template that only confuses what the label represents. Michael Z. 2013-06-24 20:09 z

Latin[edit]

The template {{la-proper noun-indecl}} belongs to Category:Grammatical context labels. I think it is an error, because:

  1. context labels are not bound to a specific language (Latin here),
  2. In two entries (Adam#Latin and Abraham#Latin) this template is located in unusual (for context labels) place.

The template {{la-conj-form-gloss/iacio/context6}} is also strange context label template, which is not presented in any entries. -- Andrew Krizhanovsky (talk) 11:03, 16 May 2013 (UTC)

Technically, it is a grammatical label, although usually we build these into our declension templates. A weakness of our whole labelling system is that sometimes a label belongs on the headword, but it is technically and visually awkward to put it there. (Also, “grammatical context label” is a nonsense phrase demonstrating that we should stop calling usage and grammatical labels “context labels.”) Michael Z. 2013-05-16 15:02 z
It's because {{indecl}} was called by this template, now it isn't, it uses {{qualifier|indeclinable}}. Mglovesfun (talk) 15:03, 16 May 2013 (UTC)
Re la-conj-form, see WT:RFC#Latin_inflected_forms_which_contain_definitions. - -sche (discuss) 23:19, 10 August 2013 (UTC)

Sursilvan and Surselvisch[edit]

Templates {{Sursilvan}} and {{Surselvisch}} prints the same "Sursilvan" and there are no links to the latter template. -- Andrew Krizhanovsky (talk) 06:51, 23 May 2013 (UTC)

They also both default to categorising words as Romansch... so I've gone ahead and hard redirected {{Surselvisch}} to {{Sursilvan}}. - -sche (discuss) 07:37, 23 May 2013 (UTC)

Karachay and Science context labels[edit]

I think that the template {{Karachay}} should belong to "Regional labels" instead of "Topical context labels".

The following templates could be moved from "Topical context labels" to "Science context labels":

The following templates could be moved from "Topical context labels" to "Mathematics context labels‎":

+ two questions:

  1. Why context label templates have two categories, where one category is a parent of another, i.e. one category is more general and another is more specific? I suppose that only the most specific category should be presented...
  2. Now Category:Science context labels and Category:Mathematics context labels‎ have the same category level. But Math is a kind of science, that is it should be a subcategory of Science :)

-- Andrew Krizhanovsky (talk) 14:16, 24 June 2013 (UTC)

Japanese Romanization[edit]

Although I know this is an old topic and something that had already been discussed and decided on back in 2006, I think there was a MAJOR point that wasn't discussed. Most keyboards (at least in America) do not have keys for the extra symbols used in Hepburn romanization and would make searching difficult. Thus, it makes more sense for words like どうじょう to be romanized as dojo (which is what most people are accustomed to seeing as it is the form used in most main-stream publications) and include doujou in the article (which is how you would input it using romaji in Microsoft IME) as well as dōjō (which I've only seen in Google translate). In addition, it's a hassle for me, and probably others who work on these entries, to manually input the macrons when creating links to romanizations.

In short, what I propose is that the vowel romanizations be main-stream and keyboard-friendly with the alternate romanizations mentioned (and even hard redirected from), but not linked to. I know that this would lead to some words linking to the same romanization, but the distinction between different words can be distinguished on the romaji page and would add actual functionality to said pages. As it stands right now, they are just simply soft redirection pages that serve no real purpose.

Example:

Romanization[edit]

dojo (hiragana: どうじょう)

Revised/Modern Hepburn: dōjō
Microsoft IME: doujou
Kunrei-shiki/Nihon-shiki: douzyou

I know that this would require a LOT of work, but I think it will make the Japanese romanization entries more search-friendly and help to bring purpose to those pages beyond a simple redirect. --Soardra (talk) 20:13, 5 May 2013 (UTC)

I don’t disagree, but I’d like to point out that searching doesn’t appear to be an issue to me. OMM searching for plain dojo finds macronized dōjō on this page, in the search field’s suggestions, in the search results, and in google. Michael Z. 2013-05-06 00:14 z
All or most Roman diacritic symbols are not an issue in the Wiktionary or Google search. Adding additional transliterations would add an overhead on editors but eventually there could be a module or a template that does it automatically. I personally don't see the need to add new outdated transliterations.
As a side-note, Wiktionary transliteration is slightly tweaked and follows the trend of popular dictionaries and practical needs. (Hepburn standard has various versions as well). Combinations like kana (おう) "o + u" are transliterated as "ō" when it's a long sound (お父さん (おとうさん - otōsan)) or "ou" when it's a verb form (思う (おもう - omou)). いい (ii) can be either "ī" or "ii". Adjective endings are always "ii". Other long vowels are consistently romanised with a macron - ā, ē, ū. Particles "", "" and "" are "wa", "e" and "o", not "ha", "he" and "wo" as when you type them in Microsoft IME. Microsoft IME is only needed when you need to enter a Japanese text on a computer. For this you need to know simple rules what input corresponds what kana letter (before converting to kanji) but there are variants, like tu = tsu, hu = fu, etc. --Anatoli (обсудить/вклад) 02:39, 6 May 2013 (UTC)
Hmm, I guess I assumed that you wouldn't be able to search for it using a standard keyboard. I see that this assumption was wrong, but I do, however, still think that there should be centralized romanization pages with variations listed or the current romanizations should be hard redirects to hiragana entries since the most recent ruling turned them into redundant soft redirects. --Soardra (talk) 03:23, 6 May 2013 (UTC)
I see no problem in creating hard redirects from unstandard romanisations to standard ones (if terms don't exist in other languages). Soft redirects are not redundant, since there are variant Japanese spellings and the romanisations also happen to be words in other languages. Nobody wished to convert them to hard redirects and the new ruling is the result of long discussions and a consensus. You can put your proposal in Wiktionary_talk:About_Japanese about the romanisation pages. The number of links needed to get to full Japanese entries just seems to be growing: "non standard Roman spellings" -> standard romaji -> katakana/hiragana -> kanji (if exists). Do we really need that many steps? JA editors are better off focusing on the Japanese language, not on all possible misspellings (with various numbers of spaces) in a script, which is not used by the Japanese. --Anatoli (обсудить/вклад) 04:03, 6 May 2013 (UTC)
By non-standard do you mean “not according to standards” or “not according to Wiktionary’s selected standard(s)?” I suggest not creating redirects for romanizations unless they are either attested, or at least following a recognized standard. Michael Z. 2013-05-12 15:27 z
  • I think there are a couple different things at work here. Some of the alternate spelling conventions that Soardra mentions are non-standard in terms of both “not according to [official] standards” and “not according to Wiktionary’s selected standards”. For instance, spelling Japanese long vowels without indicating the length in some way (such as dojo for 道場 dōjō) is common in writings by folks who 1) aren't that specific and/or knowledgeable about Japanese and 2) can't be bothered to deal with diacritics / spellings. Neither of these considerations are appropriate for a dictionary, and given that vowel length is phonemic in Japanese, we really shouldn't be including such spellings at all, provided that these common spellings can still be used to find the proper entries -- and this does appear to be the case, thankfully.
Spelling Japanese in Latin letters in a manner similar to input for the Japanese IMEs provided by Microsoft, Apple, and the various Linux communities works fine for input, but is inconsistent with usage by any English-based learning materials I've seen, and with any official governmental romanization scheme, and also with any academic romanization scheme. One might see Japanese romanized in this fashion, and it's common enough that it has its own moniker in Japanese (ワープロ式 wāpuro-shiki, "word-processor style"), but again, I don't think this romanization scheme has any place in a dictionary (other than the term itself).
In terms of what we use here, that's explained at Wiktionary:About_Japanese/Transliteration, pointed to from WT:AJA#Transliteration. Perhaps the nub of the real issue here is that we don't make WT:AJA prominent enough for new users?
(Then again, Wiktionary:About_Japanese/Transliteration is rather horribly out of date and does not describe either what we do here or what I've perceived as general practice for Japanese romanization schemes in general -- it is quite badly in need of a rewrite. Examples: we try to avoid hyphenation in most cases, using spaces instead, and we split suru verbs with a space before the suru, among other issues. I'll set to reworking that as time allows.)
More general background information is available at w:Romanization of Japanese.
-- Eiríkr Útlendi │ Tala við mig 00:47, 13 May 2013 (UTC)
@Mzajac. Eirikr has answered in the first paragraph. I have no strong opinions on redirects from non-standard transliteration but I lean on not to have them.
@Eirikr. Yes, the page needs rewriting. I actually don't think it's either our practice or other dictionary practice to romanise long vowels as "aa", "ee", or "ii" (unless they are different symbol or part of the inflected adjective form (い-adjectives). Niigata (not Nīgata) is another notable exception, perhaps for historical reasons I don't know why in 新潟 is read "nī". . Somehow, the exception is made for long "ō" and "ū", very strange, even if "ō" is more controversial おお (oo) vs おう (ou) (with well-know exceptions - verb endings - おもう (omou) or separate stems - こうま(kouma)). I don't remember the outcome of the last discussion but I consistently use macrons, if they are not exceptions as above. I'm sure one of the versions of Hepburn standards supports this, anyway. We can discuss details on Wiktionary talk:About Japanese/Transliteration --Anatoli (обсудить/вклад) 01:09, 13 May 2013 (UTC)
  • @Anatoli, (​) is an OJP-derived prefix attaching to nouns, meaning "new" or "fresh" or "first", or some variation thereof. The 新#Japanese entry is much in need of expansion. FWIW, I think the romanization Niigata uses the doubled-"i" for historical reasons, making this an(other) exception to the rule.  :) -- Eiríkr Útlendi │ Tala við mig 03:40, 13 May 2013 (UTC)
Thanks. I would like to treat "Niigata" as an exception, because that's the spelling adopted in English, otherwise: "onīsan" but "ōkii". I think chiisai should be moved to chīsai. --Anatoli (обсудить/вклад) 04:11, 13 May 2013 (UTC)

Golin, a Papuan language[edit]

Just an observation that is the only Golin word that we have listed. It might be amusing if it wasn't sad. Pengo (talk) 02:22, 7 May 2013 (UTC)

Okay? I can't find a list of counts of entries by language, but looking at Category:Nouns by language and Category:All languages, there's no more then 1,500. Given that there's 5,000 living languages, Golin's doing better then many other languages. Moreover, I'm not sure the value of adding vocabulary that no one is going to look up; the few people who know this language and have Internet access probably own a copy of the existing dictionary we would probably not be more then a pale copy of. If that's what you want to do, more power to you, but I don't think departments of anthropology are going to reference Wiktionary.--Prosfilaes (talk) 12:01, 7 May 2013 (UTC)
I added the translation of water in this language. -- Liliana 12:04, 7 May 2013 (UTC)
thanks :) Pengo (talk) 23:54, 7 May 2013 (UTC)
I added geresma. —Μετάknowledgediscuss/deeds 02:22, 8 May 2013 (UTC)
I notice that the pronunciation section of nil kabe is n'l kabé. Aren't pronunciations supposed to be in IPA (or SAMPA)? - -sche (discuss) 05:22, 8 May 2013 (UTC)
Have fixed it up. Not sure if/how the tones should be incorporated into the IPA though, so I've kept it as a separate "tone guide". (source material) Pengo (talk) 07:57, 8 May 2013 (UTC)

Translations lacking transliteration categories[edit]

After testing a change to {{t}} (thanks to CodeCat and Yair rand) I have created categories for 36 selected languages so far Category:Translations which need romanization (each category has a subcategory): Abkhaz, Adyghe, Arabic, Armenian, Bashkir, Belarusian, Bengali, Bulgarian, Burmese, Chechen, Georgian, Greek, Hebrew, Hindi, Japanese, Kannada, Kazakh, Khmer, Korean, Kyrgyz, Lao, Macedonian, Malayalam, Mandarin, Mongolian, Ossetian, Persian, Russian, Sinhalese, Tajik, Tamil, Tatar, Telugu, Thai, Ukrainian, Yiddish. I've taken out Mandarin from the template, since translations into traditional version are usually not supplied.

New translations into above languages using {{t}} without transliterations are added immediately, also any edits on English articles with translations will cause the translations to be picked up. It takes some time for categories to be filled for older translations. Please don't add languages you're not going to work with. To add a new language some change to {{t}} is required and two new categories. Any help in adding transliterations is appreciated. --Anatoli (обсудить/вклад) 00:09, 9 May 2013 (UTC)

Special:Contributions/MewBot[edit]

CodeCat (talkcontribs) via her bot MewBot is going way too far in my opinion. Violating Wiktionary:Bots#Policy by making controversial edits en masse, such as orphaning {{inv}} before the end of the deletion debate (where it looks very likely to pass as well), removing genders from {{fr-adj-form}} such as this and converting {{m|p}} to {{m-p}} (has this been discussed anywhere or is this a CodeCat personal project?)

I'm well aware of all the good work done by CodeCat, and I understand she has very specific ideas about how she wants Wiktionary to progress, but some of these ideas are controversial and shouldn't be implemented in this way. Mglovesfun (talk) 18:49, 9 May 2013 (UTC)

About the orphaning of {{inv}}, I kind of did expect someone to speak up about that. But I reasoned, if there had been no deletion debate, I still would have made these changes wherever I found them to bring it in line with Wiktionary practice, and nobody would have minded. I thought, if nobody is going to complain about many incidental edits spread over time, it would be strange if it wasn't also ok to do it all at once. I found it a bit strange that people were saying keep based on the template's widespread usage, which came across as circular reasoning ("we agree with using it, because it's widely used"). I hoped that if it became less widely used, people would judge the template more on its merits and not on its current usage.
Removing genders from adjective forms was done because the gender information is already in the definition, so duplicating it seemed a bit strange.
As far as I know, the format of genders was discussed before, in particular in regards to Module:gender and number. I don't remember when exactly but it was shortly after Lua was introduced, and the module has been around for a while now, steadily increasing in usage as more modules make use of it. —CodeCat 19:25, 9 May 2013 (UTC)
I haven't really kept in touch with policy discussions as of late but I certainly can't remember anything about genders being discussed, and so it should never have been processed by bot.
I think I've stated it before but: we've known each other since 2006, and I know that CodeCat is usually very reasonable, but it seems that there are occasional issues with changes being performed with no consensus behind it. I am confident that CodeCat does not have any bad intent and just needs to be told to take a bit more time with changes. We're not in a hurry or anything. -- Liliana 21:13, 9 May 2013 (UTC)
It's good to discuss mass changes before the fact, but CodeCat/MewBot's edits don't seem particularly extreme or controversial to me. Pengo (talk) 22:47, 9 May 2013 (UTC)

First-person singular imperative (Portuguese)[edit]

All our Portuguese verb conjugation templates include a first-person singular (affirmative and negative) imperative. I have programmed the bot accordingly. A native Portuguese speaker (ValJor) has pointed out that no such thing exists. This sounds reasonable to me. Shall I modify all the templates (and my bot) accordingly? SemperBlotto (talk) 10:34, 10 May 2013 (UTC)

Hmm, does Portuguese have a different form for giving oneself orders? I will occasionally shout orders at myself out of frustration using the imperative in English. Does Portuguese have a different form, a non-second person form? Mglovesfun (talk) 10:39, 10 May 2013 (UTC)
I'm only a pt-1, but if it is similar to Italian, there should not be a first-person singular imperative. I believe that Italians use the third-person when encouraging themselves. SemperBlotto (talk) 10:45, 10 May 2013 (UTC)
It doesn’t exist. It appears that the person who created {{pt-conj}} invented it. See Talk:cantar, Wiktionary:Beer parlour archive/2012/April#First-person Singular Imperative of Portuguese Verbs and WT:T:APT#Banning first-person imperative.
MG: Portuguese uses the subjunctive present when giving the first person singular and the third person orders. — Ungoliant (Falai) 12:03, 10 May 2013 (UTC)
OK. I'll make sure my bot doesn't create any more. Then I'll update the five thousand Portuguese conjugation templates. SemperBlotto (talk) 07:11, 11 May 2013 (UTC)

Module:gender and number[edit]

I have worked on this a bit more, and it now supports everything that our templates do, and a bit more as well. This module is already fairly widely used, not just in modules I made, either... others have used it as well. But a few people were wondering about this module and how it works, so I wrote some documentation for it to explain it, and I am now "introducing" it. I think this module can replace the current templates like {{m}}, at least as far as other templates go. If someone writes {{m}} in an entry directly, it can't be replaced, but we could change {{m}} and such themselves so that they use this module rather than the current wiki code.

There is a rather strong point to note, though. There is a slight incompatibility between this module and the templates in the way we have traditionally denoted combinations of gender and number. If you write {{m|p}}, then the templates will "know" not to display a separator between the two, it "knows" that both form a single gender specification. But in the module, m|p means "masculine or plural", and you need to write m-p instead to get the combination. This is done to keep things simpler but it also has another purpose. Gender specifications like m|f|p are ambiguous. Does it mean masculine (singular), feminine (singular) and plural (all genders)? Or does it mean masculine (singular) and feminine plural? Or masculine plural and feminine plural? The older scheme does not distinguish this, while the module does. The difference can be significant as well. Dutch, German and Swedish for example do not have a "masculine plural" or "neuter plural", only a generic plural for all genders, so for them, "masculine or plural" makes sense while "masculine plural" does not. On the other hand, French or Spanish do have "feminine plural", so then "plural" on its own isn't a valid gender. For that reason I created a set of new combined templates like {{m-p}}, and started to add them where appropriate. A few people have complained that this wasn't discussed properly, and I kind of agree, so I am mentioning it here now.—CodeCat 15:41, 10 May 2013 (UTC)

Neat. While making perfect sense within our template system, it also manages to elegantly use the vertical bar symbol (|) here with the meaning familiar to programmers of "or" (e.g. m|p means "masculine or plural") Pengo (talk) 22:02, 11 May 2013 (UTC)
Oh, that was actually my own shorthand. The module itself is indifferent to the method of separating the individual genders from each other, because it receives them already split, in the form of a list. The module can also be invoked from a template, like this:
{{#invoke:gender and number|show_list|m|f}}
This will display m, f currently. This means that any code that uses this module will have to perform the split itself. This is intentional, because there is currently a wide variety of different ways to specify multiple genders. {{head}}, {{l}} and {{nl-noun}} use a g2= parameter, {{t}} uses additional unnamed parameters, while {{fr-noun}} uses mf and then "interprets this" accordingly. It would actually not be a very good idea to use "|" as the separator to separate multiple specifications in a single string, because that would interfere with how templates interpret that character and you'd end up having to use {{!}} all the time. If we do decide to use single strings for multiple genders in this module, a comma would probably be a better choice. —CodeCat 22:29, 11 May 2013 (UTC)
If no one objects, I would like to replace the remaining occurrences of {{f|p}} and such with {{f-p}}, so that we can then look at migrating our templates to this module. —CodeCat 10:32, 15 May 2013 (UTC)

Using talk pages for RFV, RFD, Etymology Scriptorium and Tea Room[edit]

I realise that we've tried this before, but I'm not sure why it failed exactly. What I also wonder is why it seems to work better on Wikipedia. Keeping the discussions on the talk pages would have several advantages:

  • Things are kept in the place where they are the most relevant.
  • The discussions wouldn't be forgotten or missed once they are no longer at the bottom of the page.
  • Archiving becomes much easier (which is what I like about Wikipedia's method).

I am wondering what exactly would be needed to make this work. The major downside of talk pages is that any edits to them go unnoticed by the large majority of editors, so keeping things in a centralised place would be good. That's why we have the discussion rooms. Doesn't Wikipedia have bots that automatically add new discussions to the list? —CodeCat 13:13, 12 May 2013 (UTC)

WT:Families[edit]

Hey all, just thought I'd make it public known (moreso) by posting here that I made a bit of changes to this Families page. It's nothing controversial I hope; the change I am highlighting can be seen here. It's mainly to go with the fact that I made {{etyl:ngf-sbh}} to replace {{etyl:South Bird's Head}}. User: PalkiaX50 talk to meh 14:30, 12 May 2013 (UTC)

I'm glad we have a 'regularly-formed exceptional code' for South Bird's Head now, if that's not too much of an oxymoron. :) I've moved the explanation of how to create codes for subfamilies whose superfamilies have codes (and the example, ngf-sbh) into the previous paragraph. I meant to include such a line there when I overhauled the page last year, but as you can tell, I forgot (or did a bad job of it); that's why "For example, the Pama-Nyungan family is aus-pam: "aus" is the ISO 639-5 code for Australian languages" was sitting around after the bit about Germanic for no apparent reason, lol. - -sche (discuss) 17:40, 12 May 2013 (UTC)
Cool, thanks for that. User: PalkiaX50 talk to meh 18:49, 12 May 2013 (UTC)

Is this code used in HTML lang attributes? If we are just making up language codes, then let’s make up ones that won’t break our web pages.

HTML5 requires a lang attribute to contain a valid language code.[1] ngf-sbh is not valid. ngf-x-sbh would be a valid language tag, as a private-use extension.[2] Michael Z. 2013-05-16 02:35 z

map of American English dialects[edit]

Those interested in American English dialects may find this large, detailed map of dialects and their features interesting: [3]. - -sche (discuss) 19:17, 12 May 2013 (UTC)

Thanks. It seems quite good. There's still a bit more to do. For example, I think that Chicago has a dialect distinct from its surrounding communities, just as Pittsburgh, New York, New Orleans, Cincinnati, and San Francisco do. DCDuring TALK 21:03, 12 May 2013 (UTC)
I have a challenge to anyone I meet online for them to guess where I grew up based on my accent (hint: it's in the United States). I will give narrow IPA transcriptions to the best of my abilities for any words requested, answer vocabulary questions, and if necessary, record audio. Anyone who wants to try can have a crack at it on my talkpage or by emailing me! —Μετάknowledgediscuss/deeds 21:54, 12 May 2013 (UTC)
I find it interesting that the map indicates no cot-caught merger in Texas, while w:Texan English says (attributed to what is apparently a reliable source) "The cot-caught merger is found almost everywhere in Texas." Who to believe? —Angr 22:10, 12 May 2013 (UTC)
UPenn has a map of just that merger, which seems to suggest the two words are distinct in southern Texas, merged in northern Texas (with a split similar to that which Aschmann marks between Inland and Lowland). The merger has also spread over time, so it's possible the different information comes from different times. - -sche (discuss) 02:42, 13 May 2013 (UTC)

small template idea[edit]

Would it be acceptable to make any declension templates for displaying definite and indefinite articles with nouns? It is probably better suited for more inflected nouns, though. I bring this up because the German Wiktionary has something like this, for any languages that have articles. --Æ&Œ (talk) 20:26, 15 May 2013 (UTC)

Can you make an example of what you want? Like a table or something. — Ungoliant (Falai) 03:22, 16 May 2013 (UTC)
[4] --Æ&Œ (talk) 01:41, 17 May 2013 (UTC)
I guess it would be useful for Spanish, due to the feminine la/el distinction. If done, they could be added to the headword line templates, like {{es-noun}}. — Ungoliant (Falai) 03:43, 20 May 2013 (UTC)

Standard spelling of[edit]

I noticed something on [[licence]] which I think could be used more widely when handling US/UK/India/etc spellings: the use of Standard spelling of rather than Alternative spelling of. Obviously, a dedicated template would be preferable to the {{form of|...}} that licence uses at the moment, but what do you think of the general idea?
{{alternative spelling of}} would still be used when spellings are equally standard within the same dialect(s), e.g. aarrghh vs argh. {{standard spelling of}} would only be used in entries like disfavor, which is not merely an "alternative" to [[disfavour]] that some people in the US use, but the standard US spelling. In those entries, "standard" would be more accurate and less likely to be misinterpreted — as someone commented earlier, we don't mean "alternative" as a value judgement, but some people perceive it as one, and either (as non-native speakers) go away thinking the lemma is preferred or (as native speakers) get upset that their variant has been "slighted".
We could even use parameters like {{standard spelling of|foo|in=US|in2=Australian}}, rather than context labels, to effect display of Standard US and Australian spelling of and sort entries into Category:American English standard forms (or just Category:American English) etc, with reciprocal qualifier-like templates on the lemmata—like {{British spelling}}, except displaying (British spelling) rather than just (British) so as not to imply the sense that followed was what was restricted to the UK—to sort the lemmata into Category:British English standard forms/Category:British English.
For sets of spellings in which one entry has already been lemmatised (e.g. disfavour, disfavor), we should keep the status quo; for sets where there isn't a lemma yet and content is currently duplicated (color, colour), we could make the oldest entry the lemma, rather like WP does.
Thoughts? - -sche (discuss) 21:45, 17 May 2013 (UTC)

We wouldn't need this so such if we marked all the spellings that are less common currently so that we could leave the standard one unmarked. But that is completely unrealistic, at least for the next six months. Thus, if we can improve a set of entries that are alternative spellings of a single underlying term by marking the standard one using {{standard spelling}}, we should do so.
I can't support categories at this time, because they would be completely misleading for at least a "six-month" transition period until all English spellings were properly marked. We would only have to mark about 2,500-3,000 a day to get this done in six months for English and about 16-19,000 a day to get this done for all languages. DCDuring TALK 23:48, 17 May 2013 (UTC)
DCDuring, it seems like you've completely misunderstood the entire post. Please read it through again. —Μετάknowledgediscuss/deeds 01:48, 18 May 2013 (UTC)
No, thanks. DCDuring TALK 02:12, 18 May 2013 (UTC)
OK then. Assuming that I am the one who has completely misunderstood, care to explain what I got wrong? —Μετάknowledgediscuss/deeds 04:20, 19 May 2013 (UTC)
@Metaknowledge. How could I know that?
Perhaps I should have asked more questions about the proposal. Most of my concerns were with the categorization, which would have to be either complete or very well explained to be useful. As I am very skeptical about users reading and understanding our categorization criteria, which are rarely (never?) documented, at best subjective and unsubstantiated, and often whimsical, I focused on completeness.
Though you didn't say why, you also expressed opposition to categorization. DCDuring TALK 12:32, 19 May 2013 (UTC)
@-sche: I strongly support all of it except creating new categories. —Μετάknowledgediscuss/Special:Contributions/Metaknowledge 01:48, 18 May 2013 (UTC)
AFAICT, you both understood the proposal.(?) I've starting switching a small number of {{alternative form of}}s to Standard form (pending the creation of a dedicated {{standard form of}} template). I won't create new categories. We could still use the existing categories (Category:American English, etc), but I won't do that without further discussion. - -sche (discuss) 15:51, 19 May 2013 (UTC)

Migrating towards Module:languages[edit]

By way of experiment, I have changed {{languagex}}, {{derivcatboiler}} and a few other templates to use Module:language utilities (which is a "gateway" to Module:languages) instead of the traditional language code templates. From what I've seen, this move hasn't broken any more than a handful of pages (which I fixed), and it seems like it was rather easy. I noticed that some of our current templates have already become completely orphaned through this change, including most (if not all) of the proto-language code templates. I expect that the /family subtemplates will also end up orphaned once the software has worked its way through the queue.

So I would like to ask if it's ok to continue with the migration, by changing all remaining uses of the language code templates and their subpages to use the Lua module instead. —CodeCat 19:31, 18 May 2013 (UTC)

I don't think the module has been updated to reflect the changes in the mean time to various /script and /family subtemplates. Unless I'm wrong, can you deal with that first? —Μετάknowledgediscuss/deeds 04:22, 19 May 2013 (UTC)
I fixed those yesterday. I edited {{langt}} to check whether the two match, and add the code template to Category:Language codes with desynchronized data if they do not. I don't know how often that category updates, so it's possible that more changes were made in the meantime that are not shown in the category. I suppose that's one reason why we should do this sooner rather than later. —CodeCat 11:48, 19 May 2013 (UTC)
I've been working on deleting the /names subtemplates. They were never really used for anything to begin with. For the next step I would like to orphan and delete the /family subtemplates. This will be more work because there are a lot more of them, and there is a chance that some of them are still being transcluded. So I propose the following "plan":
  • Edit {{langt}} to categorise all language code templates that currently have a /family subtemplate into Category:language code templates with family.
  • Go over each of those templates with a bot, checking the /family subtemplate for transclusions. If there are no transclusions, replace the contents of the /family template with [[Category:family subtemplates to be deleted]]. That category will then need to emptied out by some means.
  • If any codes remain in Category:language code templates with family after this is complete, those need to be orphaned manually.
Is this ok? —CodeCat 14:28, 21 May 2013 (UTC)
You are preserving the contents of the templates' /names pages in the module, yes? Some of them were used / contained content. - -sche (discuss) 17:22, 21 May 2013 (UTC)
Yes, their contents was moved over to the module so the names were not lost. I actually think that we can use that information in the future to make things like {{langrev}} automatically-generated. It would also be useful to add it to the category of each language. —CodeCat 18:05, 21 May 2013 (UTC)
All looks good. I just wanted to point out that fixing up the langrev apparatus and Lua-ising it is also very important to me, so I hope someone will attack that soon. —Μετάknowledgediscuss/deeds 02:52, 22 May 2013 (UTC)
I updated Wiktionary:Languages to use the names from the module now. —CodeCat 18:11, 21 May 2013 (UTC)
Just let me know whenever it becomes time to start deleting the language templates themselves ({{aaa}}, etc). For one thing, a lot of direct uses (i.e., "Template:aaa") will have to be modified. For another, I've put a lot of effort into making sure the information in the templates is up-to-date, whereas I've noticed places where the module is not up to date, so I volunteer to delete the templates by hand after cross-checking them and the module against each other, as described on your talk page. - -sche (discuss) 17:22, 21 May 2013 (UTC)
As I noted above they were cross-checked a few days ago and the module was updated to match the templates. But if someone makes edits to the templates now, the module won't be affected of course. So for now we need to check edits to the Template: namespace regularly to see if anyone made any changes. The names and family templates are probably not used anywhere anymore, but the main (name) and the script still are, so they need to be kept synchronised. I'm not sure what to do with the direct uses. When we eventually get around to it, we can delete the templates that are orphaned, and change the remainder so that they "forward" the call to the module. That would give us some more time to work on them without having to worry about synchronisation issues. —CodeCat 18:05, 21 May 2013 (UTC)
For the record, I oppose deleting the templates until there is something in place that mimics directly calling them. The ability to type a code on a page and have the language name displayed (in such a way that the text displayed updates automatically if the language is renamed) is too useful an ability to lose. It shouldn't be hard to make such a thing. I imagine one could use whatever bit of {{etyl}} finds out the name associated with an entered code (and strip all the other derivation-related bits). - -sche (discuss) 19:07, 24 May 2013 (UTC)
There already is a replacement, Module:language utilities. It is a bit more verbose, but language templates should always be substed in entries anyway. —CodeCat 19:35, 24 May 2013 (UTC)
All family subtemplates have been orphaned and marked for deletion. I've converted more of our templates to use the module, but the software needs some time to catch up because it has affected many pages. For now, we need to check Category:Pages with script errors regularly and fix any entries that appear as the changes are applied throughout the wiki. Most of the script errors so far have been the result of missing or incorrect language codes. I am hoping to get {{languagex}} and {{langprefix}} orphaned soonish. —CodeCat 16:01, 22 May 2013 (UTC)

Langrev[edit]

Note: Please do not delete or edit the langrev templates. Doing so would break quite a few important javascript tools, including the translations adder. --Yair rand (talk) 03:21, 22 May 2013 (UTC)

{{langrev}} itself probably won't be deleted, but it may be reimplemented using a module. If your scripts rely on the presence of the subtemplates, then they should probably be fixed. —CodeCat 09:57, 22 May 2013 (UTC)
The scripts rely on the subtemplates, not Template:langrev itself. The scripts cannot be "fixed" to work with whatever system the template is reworked to use. --Yair rand (talk) 20:26, 22 May 2013 (UTC)
Are you sure there is no other way? Having to keep those templates around is going to be a real problem in the long run. —CodeCat 20:29, 22 May 2013 (UTC)
The language autocomplete system of the translation adder works by pulling up a quick list of (max 3) pages beginning with "Template:langrev/" followed by however much the user has typed in the language field, along with the raw wikitext of those pages (typically two/three bytes per page), thereby getting the user's intended language name along with its language code. This is done every time the user types another character into the field. Minimal data sent and received, no page parsing, or anything remotely expensive. Suppose this was replaced by a Lua database. There are a few options: One, download the entire 0.7MB long module every time someone wants to type in a language to a javascript tool, and base autocomplete off of that. Two, set up a complex Lua module to run through the database and find language names and codes, and have the js tool send a parse request, with a Lua module in it no less, and run with the results. Option three, dump the entire database right into the script, increasing the script size gazillion-fold, and making everyone's page loads go a lot slower. Not great solutions, I think. --Yair rand (talk) 20:56, 22 May 2013 (UTC)
Why would it not be possible for the script to call a template or module? —CodeCat 21:04, 22 May 2013 (UTC)
Certainly it would be possible, as I mentioned above, but I suspect it would perform terribly, having to run a full parse of a probably complex module on every keystroke. --Yair rand (talk) 21:21, 22 May 2013 (UTC)
(Also, I fail to see how keeping the templates could become a problem in the long run. --Yair rand (talk) 21:22, 22 May 2013 (UTC))

Category:Asturian verb forms[edit]

Are these valid? Do we assume them to be valid? Made by a banned user using an illegal bot, but if they're valid I guess we can't delete them no matter who created them. Mglovesfun (talk) 10:10, 19 May 2013 (UTC)

I have an Asturian grammar. I’ll take a look. — Ungoliant (Falai) 19:50, 19 May 2013 (UTC)
It appears they’re correct. — Ungoliant (Falai) 21:18, 19 May 2013 (UTC)

Category:en:Latvian demonyms[edit]

I've just created this category (actually, I meant to work on its Latvian counterpart Category:lv:Latvian demonyms, but I tend to create an English equivalent for a category when I see there isn't one yet), which made me have a doubt about demonyms. Is the term supposed to cover only words that refer to a person born in a specific place -- i.e., only nouns -- or also adjectives that can refer to the place, or to people who were born there? In other words, should only Courlander be placed in Category:en:Latvian demonyms, or do Curonian, Courlandish and Courish also belong there? (Of course, there is also the derived question of whether it is a good idea to subclassify demonyms by larger areas -- 'French demonyms', 'Russian demonyms', 'American demonyms', etc. -- and whether the names of these categories should be simply in the form 'Geographic Adjective + demonyms', or 'demonyms in + Geographic Noun'). --Pereru (talk) 12:08, 19 May 2013 (UTC)

I dunno. Maybe these could be added to Category:en:Latvia instead of its own category. — Ungoliant (Falai) 21:21, 19 May 2013 (UTC)

Do we really need horizontal rules between language sections?[edit]

Our standard practice has always been to add ---- right above a language header. But I don't really understand why. I can imagine that people did it because they liked the visual appearance of the extra line above the header. But it is not really necessary (or desirable) to do it that way; a better way would be to add a top border to h2 through CSS. So should we abandon this practice, or is there another reason? —CodeCat 15:17, 19 May 2013 (UTC)

Yes, it was added for the visual effect, to differentiate the language headers clearly from POS and other L3 headers. In November of 2005, somebody suggested that it should be handled automatically though CSS rather than manually, and some editors began removing all instances of it. After a few hundred pages had ---- removed, we put a stop to the effort until someone could get CSS to handle the task correctly. Unfortunately, after all these years I just do not remember the details anymore, but I recall that no one was able to figure out how to do it via CSS. I believe that our HTML experts of that time concluded that it could not be done in CSS (but I don’t remember the reasons or anything like that). So we gave up on this idea and reverted all of the pages that had been changed. —Stephen (Talk) 16:02, 19 May 2013 (UTC)
Yes. But I'm pretty sure we don't need a blank line above and below it. SemperBlotto (talk) 16:06, 19 May 2013 (UTC)
All I can think of is that it would be harder to leave the line off the first section on a page, while still showing it on the remainder. —CodeCat 16:09, 19 May 2013 (UTC)
We seem to use H2 for other things (e.g. "Latest revision" message), so apparently a class would be needed on the "----"-generated H2s. I bet some bots rely on finding the "----", too. Equinox 16:11, 19 May 2013 (UTC)
How would you remove the line from the first section of the page? What about pages with only 1 language section? DTLHS (talk) 16:13, 19 May 2013 (UTC)
There is a :first-of-type pseudo-selector in CSS which probably does exactly what we need. It's only supported in CSS 3 though, so some very old browsers will not support it. I don't think that is a huge problem for a mere presentation issue, though. —CodeCat 16:19, 19 May 2013 (UTC)

Since each h2 also has a gray rule below, this is not an ideal visual-design solution to separate it from the content above.

But, off the top of my head:

/* dispense with hr for rules above h2 */
body.ns-0 #mw-content-text > h2 { border-top: 3px double #aaa; margin-top: 2em; } /* direct-child h2’s of the content in the main namespace */
body.ns-0 #mw-content-text > h2:first-child { border-top: none; margin-top: 0; } /* but not the one at the top */
body.ns-0 #mw-content-text > #toc + h2 {border-top: none; margin-top: 0; } /* and if the TOC appears, not the first one after the TOC */
body.ns-0 #mw-content-text > hr { display: none; } /* hide now-redundant hr’s */

The immediate-child > selector ensures that none of this is rendered in MSIE 6 or earlier. Adjusting the margin-top and padding-top of the h2 might improve the visual separation.

Untested. Should be tested with both TOC shown and hidden, because the TOC contains another h2. Probably needs testing in MSIE 7, because I’m not sure that browser has proper support for these selectors. Michael Z. 2013-05-19 19:14 z

Updated. The :first-of-type selector won’t work in MSIE 7 or 8. Michael Z. 2013-05-19 19:20 z
Updated again, to hide hr’s. Michael Z. 2013-05-19 20:20 z
After a quick test, it seems to work as expected in Safari/Mac. Put the code above in your vector.css to try it out.
I have changed it to a double rule, which is nicely differentiated from the underscores. Michael Z. 2013-05-19 20:20 z
I think common.css would be better than vector.css, so that it applies to all skins. Do you think we should make this site-wide and abandon the practice? —CodeCat 20:28, 19 May 2013 (UTC)
Sure, if no one can think of any disadvantages.
The hr’s are being used as presentational elements, and with this CSS we can obviate the requirement for manual work and its inevitably inconsistent results, and reduce wikitext clutter. Hr is properly a “paragraph-level thematic break,”[5] so this is not good usage. The spec adds “There is no need for an hr element between the sections themselves, since the section elements and the h1 elements imply thematic changes themselves.” And it could fool the makers of bots and scrapers into thinking they can determine page structure from it, as you mention below. Michael Z. 2013-05-19 21:45 z
Even if the actual wikicode for the horizontal line isn't necessary for presentation, it's still a lot easier to parse ---- than to use a regex to match ==(langname)==. Just something to consider. DTLHS (talk) 20:53, 19 May 2013 (UTC)
I'm aware of that, but a bot really shouldn't be dividing sections based on the presence or absence of ----. After all, an entry may occasionally be missing it, and we don't want that to cause the bot to break things or make bad edits. The ---- is just a convenience but it can never be relied on, the headers are what counts. Furthermore, if a bot does want to divide a page into sections, then it will surely need to know the name of each section. It would be rather pointless otherwise. So even if a bot uses ---- to split the sections, it will need to parse the header anyway to find out what the name of the language is. Having said that, it really isn't all that hard to just parse the headers. I recently made MewBot do it and it was rather easy. —CodeCat 21:02, 19 May 2013 (UTC)
I recommend using ~ instead of :first-child, as far more browsers support it. That would be:
body.ns-0 #mw-content-text > h2 ~ h2 { border-top: 3px double #aaa; margin-top: 2em; }
--Yair rand (talk) 22:32, 19 May 2013 (UTC)
Good one. I never remember the ~. Until the hr’s are removed, we need only the following. Michael Z. 2013-05-20 14:07 z
/* dispense with hr for rules above h2 */
body.ns-0 #mw-content-text > h2 ~ h2 { border-top: 3px double #aaa; margin-top: 2em; } /* siblings of the first h2 of the content in the main namespace */
body.ns-0 #mw-content-text > hr { display: none; } /* hide now-redundant hr’s */
I think we could specify a little further and hide only those HRs that are immediately followed by a h2. That way we can still use them elsewhere (and for all we know, they are). Which browsers do not support this arrangement? —CodeCat 14:12, 20 May 2013 (UTC)
In CSS you can select following elements, but not preceding ones. But you could make the hr’s double and hide a following h2’s border-top, which would look identical. Michael Z. 2013-05-20 14:29 z

We really need to add back glosses to pinyin entries[edit]

I never do anything with Chinese so I'm not normally affected by the decisions made by its editors, but this is an exception. I wanted to find out what "huo long" meant. No tone marks, just that. So how do I find out what that means? The entries huo and long show a few possibilities for tones, so I choose one and then I'm presented with even more possibilities. It's just too much work to look through them all. In the end, the most helpful thing was when I did a full search for "huo long" and found out that huǒ means fire, which is the most likely meaning given the context. But what about the second word? In the old situation, the search would have been limited to 4-5 entries - one for each possible tone. That's still doable. But now I have to look through dozens of entries, which is tedious and I just thought "I'm not going to bother, this does not work for me". This is a really bad usability issue. We've already had several people complain on the feedback page that the new entry format was useless and that they want the old format back. And I definitely agree. —CodeCat 22:13, 19 May 2013 (UTC)

火龙 means fire dragon, I believe. I agree, we've gotten so many complaints from IPs on various talkpages and on the Feedback page that it looks like we're really making a mistake. —Μετάknowledgediscuss/deeds 22:19, 19 May 2013 (UTC)
Oppose. This is an undue burden on Chinese editors. Anything that puts pressure on use to develop a better way of managing glosses across multiple pages is IMO a good thing. DTLHS (talk) 22:26, 19 May 2013 (UTC)
What about the burden on the users? Or have we decided to just make entries for our own little fun? —CodeCat 22:29, 19 May 2013 (UTC)
Somewhere it says that we favour readers over editors.
But could something like the following work?: For each Han character (e.g., ), put the gloss text in 火/gloss. In the romanization entry (huǒ), have {{pinyin reading of|火}} link to the character, but additionally show the text of character/gloss – and in huǒ/gloss aggregate all of the romanized characters’ glosses. In the ambiguous diacritic-less form (huo), aggregate all of the romanizations’ glosses. It would still be more work, but it would eliminate error and duplication if each gloss were typed in only one place. Michael Z. 2013-05-20 01:42 z
Toneless pinyin should not be used for anything, unless they have become English loanwords. There are just too many tone combinations. Monosyllabic pinyin is used for disambiguation. I know the IP guy - a longtime user. (He/shes uses multiple IP addresses but that may be related to his work, home, ipad, whatever IP). He works with Vietnamese and Mandarin. See also Talk:ya3 --Anatoli (обсудить/вклад) 01:45, 20 May 2013 (UTC)
Talk:ya3 --Anatoli (обсудить/вклад) 01:45, 20 May 2013 (UTC)
^This this this...it's an extra click on from the pinyin entry to get to the real info, so what? Also, IMO nonstandard pinyin entries should not be made, with the exception of the basic "syllables" I guess.User: PalkiaX50 talk to meh 01:57, 20 May 2013 (UTC)
It's not one click. It's as many clicks as there are Han entries for a single pinyin entry, which in the case of huǒ is six, but for lóng it's 47. Do you really expect users to look through all 47 of them to find the right one? I certainly gave up when I saw that long list. It's even worse for the many users who don't even know the tone, because then they have to look through all the tones' pinyin entries as well and it multiplies. —CodeCat 02:00, 20 May 2013 (UTC)
It's a hard effort to find a Chinese word matching toneless pinyin. If a word exists, then it's easy to find by pinyin. "huolong" would also yield "huǒlóng" in the search window. Even with the same tones "huǒlóng" may mean not only "fire dragon" but also 火笼 (fire cage), 或隆 (or Long (name). --Anatoli (обсудить/вклад) 02:13, 20 May 2013 (UTC)
The answered complaints are at Wiktionary:Feedback#jin4 and Wiktionary:Feedback#li4. I'm sure they are form the same person. I don't think you can get an accurate analysis from a CEDict dump, which was done once many years ago. Single character definitions exist in the translingual sections, Mandarin are badly behind and many have no definitions. --Anatoli (обсудить/вклад) 02:07, 20 May 2013 (UTC)
I've given it some thought. Given that editors may not catch with single-characters quickly enough, Perhaps we could revert the edit of User:MglovesfunBot on monosyllabic toned pinyin entries if they are in demand, like yǎ#Mandarin? I would make entries like ya3#Mandarin redirects to yǎ#Mandarin? The translations need to be used with care, the character translation is not the same as word translation in Chinese. There are many specific Japanese characters, hanzi, which are only used in combinations, pure phonetic hanzi or their "definitions" is hardly used in real Chinese. Still, they are perhaps 95% right and may give an idea of the meaning. --Anatoli (обсудить/вклад) 02:56, 21 May 2013 (UTC)
Bot-made mass edits like diff, removing glosses from Pinyin entries, have made Wiktionary less usable for readers, for bad reasons, IMHO anyway. Wiktionary would be better off without these edits. --Dan Polansky (talk) 19:46, 21 May 2013 (UTC)
ya3#Mandarin is a duplication of yǎ#Mandarin but entries in Category:Mandarin pinyin with diacritics (monosyllabic, toned pinyin - 1,403 entries) could be restored before the last edit by User:MglovesfunBot. --Anatoli (обсудить/вклад) 02:38, 22 May 2013 (UTC)
  • I don't understand the issue. You can type toneless pinyin into the search box and, if we have an entry for it, it will automatically appear in the drop-down box... ---> Tooironic (talk) 22:43, 25 May 2013 (UTC)
You can’t even tell which of the suggested results are Chinese entries. Using that to find a sense is more work. Michael Z. 2013-05-26 19:56 z

Calques in "derived from" categories[edit]

As a result of a recent edit, I noticed that a term that I created, uncanny valley, is categorized (through use of the etyl template) into "English terms derived from Japanese." I'm a little bit uneasy with the idea of a calque like this one being listed as "derived from" the originating language (it seems a little misleading to me), so I thought I'd see if I could find a more official thought on it. —Dajagr (talk) 02:35, 20 May 2013 (UTC)

Sorry, I disagree. If the Japanese term is a calque from English uncanny valley then I think it is, in a way, derived from English. — Ungoliant (Falai) 21:08, 21 May 2013 (UTC)
You have the cases reversed; the English is a calque of the Japanese. There's certainly a justification for separating out this type of derivation, but it's not false on the face of it.--Prosfilaes (talk) 21:18, 23 May 2013 (UTC)
Yeah. In any case, a more serious issue is that uncertain derivation (“possibly from foo”) is categorised next to certain derivation. — Ungoliant (Falai) 21:27, 23 May 2013 (UTC)

Tech newsletter: Subscribe to receive the next editions[edit]

Tech news prepared by tech ambassadors and posted by Global message deliveryContributeTranslateGet helpGive feedbackUnsubscribe • 20:28, 20 May 2013 (UTC)
Important note: This is the first edition of the Tech News weekly summaries, which help you monitor recent software changes likely to impact you and your fellow Wikimedians.

If you want to continue to receive the next issues every week, please subscribe to the newsletter. You can subscribe your personal talk page and a community page like this one. The newsletter can be translated into your language.

You can also become a tech ambassador, help us write the next newsletter and tell us what to improve. Your feedback is greatly appreciated. guillom 20:28, 20 May 2013 (UTC)


Monkey business in biological class entries.[edit]

I have noticed that our entries for fish, reptile, amphibian, etc., all include a collection of pictures of animals from that biological class, but each also contains a picture of a chimpanzee. I understand that the intent is to convey that all of these things are animals, but it seems jarring and unnecessary. bd2412 T 01:55, 21 May 2013 (UTC)

I agree. — Ungoliant (Falai) 02:47, 21 May 2013 (UTC)
This is part of the Wiktionary "picture book" project that somebody started (and, I believe, abandoned) several years ago. I think it was intended to be hierarchical, with trees of related pictures. It is probably a dead thing. Equinox 02:53, 21 May 2013 (UTC)
Easy to find all the offenders. Looks like an AWB job to dispose of them, but some entries (like szympans) are using it correctly, so it'll have to be a process with human supervision. —Μετάknowledgediscuss/deeds 02:59, 21 May 2013 (UTC)
I'll take care of it this weekend, if no one gets around to it before that. Obviously this is not a high priority. bd2412 T 03:57, 21 May 2013 (UTC)
Done. Cheers! bd2412 T 03:48, 22 May 2013 (UTC)

Quotations and Examples[edit]

(I have tried to put something into the Beer Parlour on this topic a few days ago. What did I do wrong, or is it lurking somewhere out of my view?)

I'm a newbie who puts in quotations. In doing so I have been quite confused by the different ways examples are put in, and distressed by their untidiness.

I strongly suggest that provision be made for examples be treated in the same way as quotations, that is, for them to be hidden by default, but able to be switched by the reader between hidden and shown, by sense or overall. Perhaps they could be coded by a #= prefix unless that's already used. Of course the examples should be independent of the quotations.

The difference between examples and quotations should be emphasized.

An example simply illustrates briefly, in a single phrase or sentence, how the word is used in the local sense.

A quotation is accompanied by details of its source, preferably with URLs, so that the reader can verify its authenticity and follow up on its context. Quotations should be chosen to cover a variety of contexts and dates of publication.

Examples and quotations having different purposes, different readers will choose to see each or both or neither. Indeed the default display should be neither so that the reader sees a relatively compact display of senses on first going to an entry. —This unsigned comment was added by ReidAA (talkcontribs).

You had posted this on the WT:Grease pit actually. — Ungoliant (Falai) 11:48, 22 May 2013 (UTC)

Requesting a block for IP user:24.135.76.120[edit]

On User talk:24.135.76.120 they wrote

Ivane ivane tvoja ustaška propaganda je izuzetno sramna[...]

meaning "Ivan, Ivan, your w:Ustaše propaganda is extremely shameful[...]" I'm guessing this is a previously blocked user, who came back making some more uproar and causing disturbance. --biblbroksдискашн 15:14, 22 May 2013 (UTC)

Block. This is offensive and unacceptable. --Anatoli (обсудить/вклад) 23:00, 22 May 2013 (UTC)
IP has been blocked, and an abuse complaint has been filed with the ISP. --Ivan Štambuk (talk) 01:33, 23 May 2013 (UTC)
Oh come on really? Sure, block the deluded fuck from here and stop him from spreading more crap and whining on our site. But a complaint to their ISP? Really? Yeah, they're an idiot or something for doing what they did but I think that's a bit much...I don't really care too much I guess but still, merely blocking them solves the issue IMO. User: PalkiaX50 talk to meh 01:41, 23 May 2013 (UTC)
This same guy has been doing it for a very long time (I think more than two years). The majority of his edits are correct and well formatted, but once there is a disagreement about some topic, even a trivial one like in this particular case (the other IP, which was from Croatia, has wrongly marked kupatilo as Croatian-only term, which was in turn misinterpreted by him as an act of organized anti-Serbian campaign), he starts spouting abuse. And the particular swear words that he's using are extremely politically charged and insultive, and can classify as hate speech. All I have asked in the complaint is to ask their customer to stop persistently violating netiquette and remind him that Internet does not grant neither anonymity nor immunity. --Ivan Štambuk (talk) 04:18, 23 May 2013 (UTC)
For future reference, the place to post this is WT:VIP.​—msh210 (talk) 18:35, 31 May 2013 (UTC)
Noted, tnx. --biblbroksдискашн 19:00, 31 May 2013 (UTC)

…… symbol in Chinese entries instead of ....[edit]

Before I go ahead and rename Mandarin entries with an ellipsis, I'd like to ask.

Are there any serious reasons NOT to use the Chinese ellipsis …… in Mandarin, Min Nan, etc. entries, instead of the Western "..."? It looks better with hanzi, and the spelling like 不是……而是…… is how the Chinese resources describe terms when ellipsis is needed. Pinyin transliteration and entries would still use "...". --Anatoli (обсудить/вклад) 23:26, 23 May 2013 (UTC)

Vote on Wikidata[edit]

A vote is actually on going on Wikidata concerning the management of the Wiktionary interwiki links by Wikidata. There are also discussions for more advanced concept using Wikidata. The vote is there. Pamputt (talk) 15:31, 25 May 2013 (UTC)

Can someone talk me through it here? My instinct it to oppose but I say almost all support votes there. Mglovesfun (talk) 21:03, 26 May 2013 (UTC)
That poll is irrelevant. The devs already decided a long time ago that Wikidata will be used to some extent for non-Wikipedia projects, but they're focusing on Wikipedia first. The RFC was started based on a misunderstanding. --Yair rand (talk) 23:26, 26 May 2013 (UTC)
Yeah, the "Comments" section was amusing. - -sche (discuss) 23:41, 26 May 2013 (UTC)

How to format plurals[edit]

I have a question regarding pages for plural forms of words. Right below the Noun heading, you would normally put {{en-noun}} (on English pages that aren't for plural words). I was wondering what one is supposed to do when the word is plural. Do you just put '''word'''? Or '''word''' {{plural}}? Thanks. TeragR (talk) 21:13, 26 May 2013 (UTC)

  • For the headword, in these cases, I just put the word itself within triple quotes. That seems to work. If it has a gender, then I put that following the headword. SemperBlotto (talk) 21:17, 26 May 2013 (UTC)
I would suggest {{head|en|plural}} or perhaps {{head|en}}. —CodeCat 21:18, 26 May 2013 (UTC)
... which achieves the same thing but causes the wiki to fetch a template and interpret it. SemperBlotto (talk) 21:20, 26 May 2013 (UTC)
Not anymore, the templates are deprecated now. —CodeCat 21:21, 26 May 2013 (UTC)
So it causes it to fetch a module. :b - -sche (discuss) 21:22, 26 May 2013 (UTC)
The default is '''word''' or {{head|languagecode}}, whichever you prefer. Sometimes people provide more detail, adding things like {{head|languagecode|plural}} or {{head|languagecode|noun form}} or a gender, as SemperBlotto and CodeCat note, but often the definition-line templates already add the plural or noun form categorisation which is (or would otherwise be) the main discernible benefit to {{head|languagecode|plural}}. - -sche (discuss) 21:22, 26 May 2013 (UTC)
I don't think that the definition templates should add categories, because different languages may have different needs. For example, {{plural of}} categorises into "plurals" but that would make no sense for a language that has more noun forms than just a plural, and never mind adjective plurals. That is why I prefer {{head|en|plural}}, so that in the event that {{plural of}}'s category is removed in the future, the entry is already prepared for it. Also, {{head}} or some other headword-line template is mandatory for any language but English, because it also sets the script and language of the headword. You might as well use it for English as well, for consistency if nothing else. Another difference is that {{head}} uses the script code's formatting rules, which are more detailed/precise than mere bold text. {{Latn}} for example will format a headword with <strong class="headword" lang="en">word</strong>. —CodeCat 21:30, 26 May 2013 (UTC)
How long does it take the server to interpret {{head|en}} as opposed to word? I bit it's microseconds. Mglovesfun (talk) 21:32, 26 May 2013 (UTC)
It's not as easy to tell because the usage cost of language codes is amortized. The language module is loaded on the first use on a page, but once it's loaded it stays available for that page. So the first use is relatively expensive because the module needs to be loaded, but subsequent uses are very cheap as they are just a matter of a single lookup. —CodeCat 21:36, 26 May 2013 (UTC)
Understood. Thanks for all the speedy replies! TeragR (talk) 22:22, 26 May 2013 (UTC)
A lot depends on the language and on the part of speech. In languages other than English, it's a good idea to look at the About page for the language (if there is one), and at other similar entries to see what the accepted practice is. Also look for headword-line templates. The best way to find all of that is to go to the main category for the language. For Spanish, for instance, that would be Category:Spanish language. For other languages, it would be the same, but the language name would be different. Chuck Entz (talk) 23:26, 26 May 2013 (UTC)

{{en-noun}} really needs a form for plurals. It is inadequate and confusing that the second-most common form of English noun is a radical exception in entry format without a clear guideline. Michael Z. 2013-05-29 14:46 z

Something like {{en-noun-form}} you mean? —CodeCat 17:13, 29 May 2013 (UTC)
I mean a parameter. We already have:
  • {{en-noun|-}} for uncountable
  • {{en-noun|!}} for plural not attested
  • {{en-noun|?}} for unknown or uncertain plural
Not sure of the difference between the last two. How about one of:
 Michael Z. 2013-05-31 21:46 z
As far as I know, the most common practice is to use the headword-line template only for lemmas, and use another (special purpose or {{head}}) for non-lemmas. —CodeCat 22:24, 31 May 2013 (UTC)

Māori? Maori?[edit]

See also WT:RFM#Template:mi

This edit prompted me to look into how the {{mi}} language is described here on WT. I'm used to seeing the language name as Māori, with the macron, and that's also how the language is listed at w:Māori. However, I see that the macron-less form is in common use here on WT.

Does anyone know why we don't use the macron consistently here? Is it a matter of legacy spellings? I recognize that the macron-less form is common enough to warrant inclusion; that's not my concern. The language name as listed, however, should be the more "official" spelling with the macron, no?

Curious, -- Eiríkr Útlendi │ Tala við mig 03:48, 28 May 2013 (UTC)

A look at the interwiki listings on the side of many entry pages will show how few of our L2 headers match the "official" spelling. Since the macron-less spelling is common in English, and also much easier to type, I think we should stick with it- except for the page name for the Maori word, itself, of course. Chuck Entz (talk) 04:20, 28 May 2013 (UTC)
I'll paste my comment from WT:RFM#Template:mi: I favour "Maori" and oppose a move. "Māori" is, indeed, the native name of the language, but English disuses diacritics, and even before I overhauled it, WT:LANGNAMES noted that Wiktionary avoids diacritics, too. Like CodeCat said [in the RFM discussion], "Maori" is common, in part because many non-specialists write about the language. In contrast, [en.Wikt does use diacritics in other languages names, e.g. ǃXóõ, because] the only people who write about ǃXóõ tend to be specialists who spell it "ǃXóõ". - -sche (discuss) 04:31, 28 May 2013 (UTC)
Yes and we have French not français (and so on). Mglovesfun (talk) 10:52, 28 May 2013 (UTC)
That’s not a useful comparison. It appears that Māori is correct formal written English among people who write about the language, while français is just French.
I don’t see any evidence that Wiktionary avoids diacritics. Michael Z. 2013-05-28 14:22 z
There are 388 “official” language names with diacritics, out of 7644 in WT:LANGLIST (1 in 20), plus a handful with click characters or apostrophes. I have no way to determine how many diacritical names we have rejected, but we have accepted hundreds, so the statement that “diacritics are avoided” is dubious. Any reason not to remove it? Michael Z. 2013-05-28 16:11 z
The official one? The ISO 639-2 and 639-3 spelling is Maori. News.google.com shows a lot of news articles with Maori, more then with "Māori"; I don't think that one spelling can be declared the correct formal form for written English. I'm personally a fan of writing English with English characters, and not with a plethora of diacritics not normally found in English.--Prosfilaes (talk) 00:33, 29 May 2013 (UTC)
Diacritics are the bits above the characters. The form Māori is relatively new to me. But having grown up with English, I find it looks wrong when the diacritics normally found in the language are omitted. Michael Z. 2013-05-29 03:28 z

Language code migration step 2[edit]

The script templates have almost all been marked for deletion, only a few of the most-used ones still have transclusions because the software hasn't updated all the pages yet, but I am working on that. That leaves two points remaining. The first is a relatively small one, the /documentation subpages that a few of the language templates have. They are listed in Category:language code templates with documentation but that list is not yet complete, the category is still updating. I'm not sure if they should be deleted outright without preserving the information somewhere, but I don't know where to preserve it either.

The second point of course is orphaning and deleting the code templates themselves. This will be more difficult because many scripts like the translation adder and entry creator rely on their presence, because they subst them into the entry. Substing is not a problem in itself, the module itself is also substable so it is a matter of changing {{subst:{{{lang}}}}} into {{subst:#invoke:language utilities|lookup_language|{{{lang}}}|names}}. But we will need to update everything that still uses the templates. Unfortunately I don't know where all or even most of the uses are, and I am not very good with JavaScript so someone else needs to look at the gadgets. Bots that use these templates will also need to be fixed. —CodeCat 16:14, 28 May 2013 (UTC)

Ok, I've thought about it and I propose the following. To give us some time to work out fixing all the things that need the templates, we can use a temporary solution that is compatible with the module. Every language template's contents is replaced with the following: {{<includeonly>subst:</includeonly>#invoke:language utilities|lookup_language|{{subst:PAGENAME}}|names}}<noinclude>{{langt-deprecated}} . That way, the template still works, but can only be substed so it's not possible to use it in templates anymore. The template {{langt}} is also removed by this edit and replaced with {{langt-deprecated}}, which puts the template in a different category Category:Language code templates to be deleted. conl: and proto: templates can be deleted outright, because I don't think any scripts or bots subst them directly. Is that ok? —CodeCat 20:54, 28 May 2013 (UTC)
Are you talking about script templates or language templates? I strongly advise against deleting either any time soon. The language templates are used in things other than just substing, and can occasionally be very hard to replace with the module. --Yair rand (talk) 10:50, 29 May 2013 (UTC)
Can you give an example of where the language templates are currently still used? Template:en is already orphaned so wherever it is, it's not on this wiki. —CodeCat 10:55, 29 May 2013 (UTC)
They're probably still used by plenty of bots, and WT:ACCEL also still uses them, but those wouldn't be harmed by replacing them with a module invokation as you suggested above. The main issue we need to look out for is scripts/bots that pull out the wikitext directly, which definitely would break as a result. There are probably still a few of those, some of which might not have their code on-wiki. (There are probably also loads of data-miners that depend on them, but I'm not sure if we should take that into account.) --Yair rand (talk) 11:34, 29 May 2013 (UTC)
I use the language templates for two purposes: (1) if I don't know what language is associated with a particular code, or whether a particular string of three letters exists as an ISO 639-3 code, I go to the template and see; (2) when I'm adding requested translations or assigning translations to be checked to the appropriate sense, I change {{trreq|xyz}} and {{ttbc|xyz}} to {{subst:xyz}} rather than type out the language name, which saves typos as well as trying to remember exactly which name we use for certain languages. (Has the pendulum swung to Maori or Māori this month? How do you spell Guugu Yimithirr and Sḵwx̱wú7mesh? If I just type {{subst:mi}}, {{subst:kky}} and {{subst:squ}} I don't have to worry about it.) —Angr 12:43, 29 May 2013 (UTC)
To Yair: That is mainly a matter of time and gradual migration, which is why I proposed the interim solution above. On one hand, we don't want to break too many things, but on the other hand, some things will probably not be fixed until they break because we weren't aware of them. Bots probably fall into the latter category as well. So, with my proposal, bots that use the raw wikitext of the template will break, but anything that substs the templates will still work. —CodeCat 15:43, 29 May 2013 (UTC)
I have deleted the documentation pages and preserved their contents on Wiktionary talk:Languages. —CodeCat 21:23, 28 May 2013 (UTC)
Please note that I initiated a similar migration on fr.wikt (decision in progress). We plan to use a template {{language name|xxx}} to replace the names in the templates (instead of things like {{ {{{lang}}} }}). We will also keep all old language templates as they may be needed for historical reasons, but in that case we will replace their content with a call to the module to reflect the updated names. Dakdada (talk) 09:27, 29 May 2013 (UTC)

Archaic or Obsolete for spellings with 'æ'[edit]

Presently, there are a lot of words marked as obselete that are spelled with an 'æ' (e.g. most of the English terms spelled with Æ). But, I'm not sure that is appropriate because:

According to here obsolete means:

  • No longer in use, and no longer likely to be understood. Obsolete is a stronger term than archaic, and a much stronger term than dated.

Whereas archaic means:

  • No longer in general use, but still found in some contemporary texts (such as Bible translations) and generally understood (but rarely used) by educated people. For example, thee and thou are archaic pronouns, having been completely superseded by you. Archaic is a stronger term than dated, but not as strong as obsolete.

And dated is:

  • Formerly in common use, and still in occasional use, but now unfashionable; for example, wireless in the sense of "broadcast radio tuner", groovy, and gay in the sense of "bright" or "happy" could all be considered dated. Dated is not so strong as archaic or obsolete; see Wiktionary:Obsolete and archaic terms.

I think obsolete is too strong for these spellings, as it will be understood by readers, although perhaps not in use, perhaps Archaic is appropriate?

This discussion started on the talk page for diæresis. WilliamKF (talk) 20:09, 28 May 2013 (UTC)

Can a spelling ever really be obsolete and not even be understood? —CodeCat 13:56, 20 May 2013 (UTC)
I think we need to change our definition of "obsolete" with respect to spellings. Archaic and dated both say the form is still found/still used, whereas an obsolete spelling is one that isn't used at all anymore. A spelling like sphære isn't "still found in some contemporary texts", nor is it "still in occasional use, but now [merely] unfashionable". It's not used nowadays at all, unless someone is deliberately using outdated spellings for effect. It's obsolete. —Angr 20:17, 28 May 2013 (UTC)
But isn't that the definition of archaic? People can use archaisms in speech with much the same effect, but archaic terms are not really used anymore outside of that function, much like archaic spellings might be. —CodeCat 20:27, 28 May 2013 (UTC)
A spelling doesn't have to be unintelligible to be obsolete, it just has to be unused. (Most soldiers recognise chariots when they see illustrations of them, but chariots are obsolete military equipment because no army uses them.) I have been marking ligature spellings as obsolete whenever no proof that modern authors still use them is available, or as archaic when such proof is available. - -sche (discuss) 23:41, 28 May 2013 (UTC)
If we were to look at a population larger than that represented by contributors here, say, one that included those with no higher education, would we find that the majority were not confused by digraphs? I think not. It might be trivial for each individual to learn, but I do not think that most in the larger population ever have cause to decode such things and thus would not have had the opportunity to learn. If so, that would favor {{obsolete}} rather than {{archaic}}. OTOH, if we intend to only serve those very much like ourselves, {{archaic}} might indeed be more accurate. DCDuring TALK 12:02, 29 May 2013 (UTC)

What about encyclopædia? I'd say this one is even dated, certainly not obsolete as presently listed. I still feel obsolete is too strong for this category of words given the ease in understanding it. WilliamKF (talk) 13:18, 29 May 2013 (UTC)

Good example. We don’t label categories of words, but words, each on its own qualities and usage. Sphære and encyclopædia do not deserve the same label.

DCD, labels should be chosen based on (our estimates of) word usage in the corpus. Without evidence, the discussion about what education levels are confused by what digraphs seems purely speculative. Michael Z. 2013-05-29 14:38 z

I agree. In my example above I picked sphære rather than encyclopædia for a reason: the former is truly obsolete, the latter isn't. As I understand it, though, the OP just picked Category:English terms spelled with Æ as an example of a good place to find spellings marked obsolete; there was no implication that all spellings there are obsolete. Nor are they the only ones; I'd call gaol dated, shew archaic, and queene obsolete (though I see that our entries do not agree with my opinion on all counts). —Angr 14:51, 29 May 2013 (UTC)
I agree that queene is obsolete and have updated it accordingly. - -sche (discuss) 16:53, 29 May 2013 (UTC)

Permission for bot edits[edit]

Two proposed tasks are:

  1. convert {{fr-adj-mf}} to {{fr-adj|mf}}. Literally involves replacing one character
  2. convert {{fr-noun-inv|(gender)}} to {{fr-noun|(gender)|inv}}, fr-noun now accepts inv

Per WT:BOT#Policy am asking for 'consensus' rather than doing these outright. Mglovesfun (talk) 12:26, 29 May 2013 (UTC)

This seems very uncontroversial so I don't think you need to ask permission for it. —CodeCat 15:39, 29 May 2013 (UTC)
I disagree. --Yair rand (talk) 17:56, 29 May 2013 (UTC)
Go for it. Let me know when it's done (need to modify bot that generates adjective/noun forms). SemperBlotto (talk) 15:42, 29 May 2013 (UTC)
I would like to edit WT:BOT#Policy a little, but then probably every policy we have, I would like to reword a little (or a lot). Mglovesfun (talk) 17:52, 29 May 2013 (UTC)
Symbol support vote.svg Support. --Yair rand (talk) 17:56, 29 May 2013 (UTC)
I abstain; looks good to me, on the face of it. --Dan Polansky (talk) 20:31, 30 May 2013 (UTC)

Migrating the family templates[edit]

I have started migrating the family templates as well. I've followed the same procedure and it seems that everything is working correctly now, using the information that was in Module:families. I would now like to orphan all the templates in Category:Family code templates and then mark them for deletion as they are orphaned. Is that ok? —CodeCat 22:07, 30 May 2013 (UTC)

Etymologies of Chinese Characters[edit]

Hello all. Wiktionarians have inserted paraphrased interpretations from my etymological dictionary (Kanji Networks) in the Etymology sections of the entries for about 30 Chinese characters. Examples include 二, 三, 四, 六, 七, 八, 九, 十, 中, 右, 百, 千, 年, 服, 者, and 暑. Links to KN also appear on the pages of several characters (林, 森 and 少) the etymologies of which are quite different from the KN interpretations.

I'm fine with the use of the Kanji Networks interpretations in Wiktionary. I'm also fine with the use of differing interpretations. What I'd like to confirm is that users be clear on the source(s).

If it does not represent a conflict of interest for me to build on work that has been started by others, I propose to take a hand in adding a Source note immediately following each KN interpretation, linking to the KN URL for that character. (The assistance of others in the task of expanding the number of characters etymologized will of course be greatly appreciated.) Entries with etymologies differing from those proposed by KN I will not touch in any way; other Wiktionarians may handle their presentation as they see fit, including the question of whether and how to source them.

Please advise whether this kind of participation is in line with Wiktionary policies. Thank you very much. Lawrence J. Howell (talk) 05:03, 1 June 2013 (UTC)

Leaving aside matters of policies, ethics, and the like, I see a big practicality problem: this is a wiki- anyone can edit any entry at any time. You might put your source note on an entry, and 5 minutes later, someone might completely change and rearrange the entry, making it look like the new version was derived from KN. Unfortunately, not all of those contributions are the sort you would want to be associated with (currently we're doing battle with an anonymous contributor who thinks Manga fan-sites and Bing Translate are authoritative sources!). Or someone else may take that original interpretation, and replace it with thinly-disguised plagiarism from your site. Or it may flip back and forth as different contributors weigh in. It's a lot of work keeping on top of it all. Chuck Entz (talk) 07:48, 1 June 2013 (UTC)
Thank you very much for the swift and helpful reply, Chuck. It appears I may have underestimated the maintenance aspect. Still, I'd be inclined to give things a shot on an experimental basis if at least a couple of Wiktionarians were to weigh in here on the ethical issue with a "don't sweat it" opinion. I'll hold off and see what develops on that score. Thanks again. Lawrence J. Howell (talk) 08:42, 1 June 2013 (UTC)
Seems perfectly fine to me. If there are differing interpretations and you want to note the KN one, footnotes might be good; otherwise a reference is probably fine. As for the practicality; one more pair of eyes looking out for vandalism/plagarism is only going to make things better. Hyarmendacil (talk) 09:49, 1 June 2013 (UTC)
Thank you, Hyarmendacil, for your input. As no dissenting voices have been raised, I'll set to work. To indicate what I'm doing, and why: At present, the entries of many Chinese characters lack an Etymology section. In the entries that do have such a section, we find "etymology" taken to apply to data as diverse as a) Unannotated graphic evolution (公 王), b) Bare-bones presentations of the phonetic and semantic elements of compounds (花 成 頭 就 機 國 etc.), c) Unsourced speculation (不 有 好 etc.), d) Interpretations contraindicated by historical evidence (see 可, where the explanation doesn't mesh with the attestation of 可 in oracle bone script and the absence of attestation for 奇 until seal script), e) Ancient phonology: Baxter/Sagart's reconstructions are mentioned in a handful of characters (代 的 家 and 海) and Schüssler (Schuessler) is noted in 大. In short, the Etymology sections exhibit little consistency in format, content and purpose. For that reason, I'm creating a distinct section for the data I can supply, and labeling it Phonosemantic Interpretation. After posting this message, I'll begin with the following fifty characters, which those interested may care to inspect: 一 二 三 四 五 六 七 八 九 十 百 千 万 年 林 森 大 小 少 中 右 左 下 上 木 服 者 暑 月 人 的 不 子 女 是 心 我 有 之 天 口 水 日 他 又 山 丸 金 力 无 Comments and suggestions are most definitely welcome. Lawrence J. Howell (talk) 05:15, 3 June 2013 (UTC)
If the header is to be used, its second word should be lowercase ("Phonosemantic interpretation"; compare "Alternative forms" and "Usage notes"). However, the info added to [[]] seems like primarily etymological information that could have stayed under an ===Etymology=== header, and if there is info that would not be appropriate under an etymology header, it might be OK to consider it ====Usage notes====.
Also... I notice this claim in the entry : "[eight] may be regarded as having been considered the single-digit number divisible by two the greatest number of times". In what way is eight a single-digit number? In Arabic numerals, which were unknown to ancient Asia? Or in the sense that "八" is a 'single digit', in which case the argument seems circular (if someone says "this single digit was used for eight, because eight is the single digit number divisible by two the greatest number of times", one could ask "why wasn't a double digit chosen?"). - -sche (discuss) 01:47, 4 June 2013 (UTC)
Thank you very much for the input, -sche. The presentation matters come as welcome advice, and I acknowledge the applicability of your point about 八. Before amending or redoing anything I'll wait a few days in hopes of obtaining further useful comments and suggestions. Thank you again. Lawrence J. Howell (talk) 04:01, 4 June 2013 (UTC)