Wiktionary:Beer parlour/2013/April

Idea for proper noun entries that belong in an encyclopedia

There seem to be a lot of proper nouns that show up on WT:RFD. Many of these have articles in the EN WP. Since people are clearly looking for these entries, and some editors mistakenly think such entries belong here, while some readers mistakenly think they can find those entries here, it's clear there's some demand for having proper noun entries here at EN WT.

What would folks say to allowing the creation of proper noun entries, such as Mona Lisa or Mini Cooper or Hound of the Baskervilles, but just as redirects (soft or hard, as deemed appropriate) to the corresponding EN WP article? This would meet the apparent demand for such entries, while not wasting EN WT editor time writing and maintaining them, and while avoiding the inclusion of encyclopedic material in this dictionary project. -- Eiríkr Útlendi │ Tala við mig 17:06, 2 April 2013 (UTC)[reply]

I don't think hard redirects to Wikipedia are even possible; they'd have to be soft. Wikipedia itself already has w:Template:Wiktionary redirect for pages that will only ever be dictionary entries; all we need to do is make a corresponding template here. —An gr 17:32, 2 April 2013 (UTC)[reply]

Sounds good to me. I see, however, that Semper Blotto deleted Template:Wikipedia redirect way back in 2006... -- Eiríkr Útlendi │ Tala við mig 17:35, 2 April 2013 (UTC)[reply]

I don't see why this is a good idea. How is it better than just not having the entries at all? How do we decide which entries need {{only in|{{in wikipedia}}}} and which are red links? Or do we create such redirects for all entry titles which have Wikipedia articles? Mglovesfun (talk) 17:38, 2 April 2013 (UTC)[reply]

It seems fine to create {{only in}} redirects to WP for all proper nouns (Why not all entries of any kind?) for which we do not have an entry. Editors can replace the redirect with an entry, which is subject to the usual reviews. At the very least we should use the redirects for proper noun entries that have failed RfD for whatever reason. DCDuring TALK 17:58, 2 April 2013 (UTC)[reply]

Sorry, I thought my initial comment explains the "why" -- users, both as editors and as readers, are clearly coming to Wiktionary in search of such entries.

As to which entries to convert, any proper noun entry that editors think should not be in Wiktionary would be a candidate for such redirection. If deemed necessary for clarity, the redirection template could include text explaining that Wikipedia might not yet have such an article, but that if anyone were to create such an article, it belongs in Wikipedia and not here.

I'm simply floating an idea about how to respond to apparent user demand for encyclopedic proper noun entries in a way that 1) meets that demand, 2) points users to the appropriate place for such entries, and 3) and doesn't require much work from editors. -- Eiríkr Útlendi │ Tala við mig 17:59, 2 April 2013 (UTC)[reply]

A good idea, but there already is a page that comes up when someone goes to an undefined proper noun. See, for example Starry Night, Mini Cooper S, or A Study in Scarlet. It just doesn’t serve the required needs. The page that comes up for starry night, mini cooper s, or a study in scarlet is a bit better, but still could be improved.

I wonder if there is a way to improve “perhaps there is a page xxx in our sister encyclopedia project, Wikipedia.”

Anyway, let’s improve the 404 page instead of reinventing the wheel. —Michael Z. 2013-04-02 18:18 z

The fact that people are searching for things doesn't mean we should include them, even as redirects to Wikipedia. The number one search on a user-generated replacement for Special:WantedPages was in fact the Mandarin for 'naked porno movies'. Mglovesfun (talk) 20:59, 2 April 2013 (UTC)[reply]

Well, dang it, someone should create that Wikipedia article already.

<ahem.> On a more serious note, the issue is not just that folks are searching for such pages, but that they are actually creating them. This generates maintenance overhead for WT editors. Redirecting users to Wikipedia might help reduce this overhead. -- Eiríkr Útlendi │ Tala við mig 21:03, 2 April 2013 (UTC)[reply]

But redirects are bluelinks. If we make tens of thousands of redirects, how will anyone notice the few bluelinks which have, wrongly, been created as full entries that we (by our current policies and culture) tend to subject to WT:RFD? I agree with Michael: improve the "404" that comes up when someone clicks on [[Some Proper Noun]], goes to [1] or uses the search bar to search for "Some Proper Noun". - -sche (discuss) 21:28, 2 April 2013 (UTC)[reply]

We already make color distinctions in our links: a lighter blue for links to other projects, orange for links with the wrong section. A bot could replace links to {{only in}} entries with {{w}} links or "w:" piped plainlinks. Improving the 404 only partially addresses the problem, though it has the enormous advantage of, in principle, being easier to implement. DCDuring TALK 22:04, 2 April 2013 (UTC)[reply]

I have never noticed light blue or orange, and as far as I know I have a good computer displays and good color vision. —Michael Z. 2013-04-03 14:28 z

Orange links have to be turned on in your Per-browser preferences; as for light blue links, don't you see a difference between blue and blue? For me the difference is subtle but real. —An gr 15:24, 3 April 2013 (UTC)[reply]

Looks precisely like visited and unvisited links. Since external links are marked with a little arrow, nothing has ever prompted me to associate that colour variation with another class of external link. —Michael Z. 2013-04-03 19:52 z

Ah, I see. Red links and blue external links turn paler when visited, but blue internal links turn darker, or lighter when unvisited but pointing to other MW sites. Shoulda been obvious. —Michael Z. 2013-04-03 20:08 z

Huh. For me, both internal and external Wikimedia links turn purple when visited, again with a subtle but present difference in shade. —An gr 20:22, 3 April 2013 (UTC)[reply]

Exactly. We also have greenlinks for no page corresponding to inflected forms, if you have the gadget for accelerated creation of these selected on user preferences. (See conquest#Verb.) DCDuring TALK 15:33, 3 April 2013 (UTC)[reply]

I realize that I had possibly misinterpreted Mglovesfun's previous comment as suggesting we shouldn't even rework our 404. To clarify, I am not advocating that we start creating scores of pages solely for the purpose of redirecting to WP. My intent instead was originally just to ask if perhaps proper noun pages, particularly those that fail RFD (which I should have stated more specifically earlier), would benefit by having redirects to WP. Michael's suggestion of reworking our 404 sounds like a wonderful idea, either alongside specific redirects for pages that failed RFD, or as a replacement for that idea. -- Eiríkr Útlendi │ Tala við mig 22:19, 2 April 2013 (UTC)[reply]

Then I'll back Michael's idea as well. Mglovesfun (talk) 09:16, 3 April 2013 (UTC)[reply]

Does anyone know where to edit these pages, and how to create the special links on them? Are the docs? —Michael Z. 2013-04-03 14:28 z

MediaWiki:Noarticletext contains the "Wiktionary does not yet have a mediawiki page for Noarticletext" message; you can change the message by editing that page. (There's also MediaWiki:Noexactmatch, but I don't know that it's used anywhere.) MediaWiki:Searchmenu-new, and possibly other pages, control(s) what's displayed when someone searches for a term we don't have. - -sche (discuss) 20:22, 3 April 2013 (UTC)[reply]

Thanks. And do you know where to find the 404-from-a-link page, e.g. mini cooper s, and the additional wrong-case message added to Mini Cooper S? —Michael Z. 2013-04-03 21:09 z

I presume it's one of these pages, but I don't know which one. - -sche (discuss) 21:40, 4 April 2013 (UTC)[reply]

The easiest way to find out is to visit http://en.wiktionary.org/w/index.php?title=mini_cooper_s&action=edit&redlink=1&uselang=qqx and examine the indicated messages. For example, (creating: mini cooper s) holds the place of a message generated by MediaWiki:Creating with $1 set to mini cooper s. (qqx is in the "private use" range of language-codes, so some enterprising MediaWiki developer decided to appropriate it for this purpose. I'm guessing the feature's primary target audience was interface translators, so they could find the message that they need to translate, but I've found it very useful myself.) —Ruakh_TALK 04:39, 8 April 2013 (UTC)[reply]

Gothic romanisation template

I have created Template:got-romanization (different from Template:got-romanization of!) and a sample entry "afdrausjan" (modified to use the new template). As with Japanese Template:ja-romaji the definition line with # is generated by the template, so it has both the headword and a definition. It has the same look and feel as a new romaji entry. Like Japanese, the Gothic entries only link to the main entry, no other information. --Anatoli ^{(обсудить}/^вклад) 03:38, 4 April 2013 (UTC)[reply]

How is it different from {{got-romanization of}}? The output seems to be the same: they both say, "See XYZ" where XYZ is the spelling in the Gothic alphabet. I preferred it when it said "Romanization of", though. —An gr 10:52, 4 April 2013 (UTC)[reply]

It's an attempt to make romanisation entries of different languages more similar to each other. Template:ja-romaji is increasingly used for Japanese romaji entries and there are two votes Dan Polansky has created in the protest of the change that was agreed on by JA editors after a very long discussion in BP. The votes: 1. Wiktionary:Votes/pl-2013-03/Japanese Romaji romanization - format and content and 2. Wiktionary:Votes/pl-2013-03/Romanization and definition line. The second vote is specifically about the approach on how definition line is added. Usually it's # on a new line in the wikitext. The new Japanese and the proposed Gothic template generate the definition line, thus not editable directly.

User:Mzajac raised a concern that Japanese and Gothic are different from each other. Both Japanese and Gothic by default don't produce any definition as such, only a link to the main entry. Using a template will enforce this rule. The definition line will still be there (thus complying with Wiktionary:ELE#Definitions) but a new definition line is only added when a new parameter is added. The suggested template is much shorter and as proved by the current work on Japanese romaji entries can be generated very quickly both by people and bots.

Re: "See" and "Romanization of". Again, just to make both templates (Gothic and Japanese) look similar. There's already the word "romanization" at the header level.

New:

==Gothic==

===Romanization===
{{got-romanization|𐌰𐍆𐌳𐍂𐌰𐌿𐍃𐌾𐌰𐌽}}

Old:

==Gothic==
===Romanization===
{{got-rom}}

# {{got-romanization of|𐌰𐍆𐌳𐍂𐌰𐌿𐍃𐌾𐌰𐌽}}

--Anatoli ^{(обсудить}/^вклад) 11:23, 4 April 2013 (UTC)[reply]

Transpondine Portuguese

There is nothing in Wiktionary:About Portuguese concerning spellings on opposite sides of the Atlantic. I have been adding Brazilian forms as "alternative forms" of the spelling used in Portugal. But often, I see that the Portuguese Wiktionary does the exact opposite. Does anyone have an opinion on what we should do - or should it be up to the personal preference of our editors? SemperBlotto (talk) 15:55, 4 April 2013 (UTC)[reply]

Personal preference. — Ungoliant ^(Falai) 00:12, 5 April 2013 (UTC)[reply]

Possible inadequacies in Template:Han char

In a discussion with User:Gdbf137, we discovered that Mac and MS seem to use different Cangjie input sequences. The Unihan database entry for 农 gives a Cangjie input sequence of LBV. Apparently, that works correctly on Mac OS X Lion. On Windows 7, however, MS's Changjie IME accepts HBV to input this character, while LBV just generates an error beep and no character is output.

Does anyone else have a handle on what's going on? Do we need someone to change {{Han char}} to allow for multiple Cangjie input strings, one per OS? Or, more frighteningly, has Microsoft and/or Apple been changing things willy-nilly, and we need to allow for multiple Cangjie input strings, one per OS version? -- Eiríkr Útlendi │ Tala við mig 17:35, 4 April 2013 (UTC)[reply]

Is this related to Cangjie_input_method#Versions_of_Cangjie? "Currently, version 3 (第三代倉頡) is the most common; it is the version of Cangjie supported natively by Microsoft Windows ... The Cangjie input method supported on the Mac OS is somewhat like Version 3 and somewhat like Version 5." I don't know what the solution to this would be other than to specify what version the template is referring to. DTLHS (talk) 04:49, 5 April 2013 (UTC)[reply]

Cross-script/mutated semi-borrowings

This seems to be a repeated question, and it's come up again at Wiktionary:Requests for deletion#da. What do we do with half-borrowed words? Stuff like "da", which is clearly a Russian word being used in English, or "si", which is clearly a Spanish word being used in English, even if both would never be spelled that way in their original language. google books:si senor gets a lot of hits of English hits, even once we've excluded "Sí, Señor". Writings across the world are dropping a little bit of foreign language that their audience will understand in their text, and whenever there's orthographic differences, we'll probably see this type of change. "Da" can probably be attested in every major European language in this sense. Instead of creating senses under da for all the languages, maybe we could create a orthographically mangled (for foreigners) version of|да template (name to be changed, of course) and stick it under Russian. Same thing with si and danke schon and probably some mangled Latin we've deleted, etc. (This doesn't intend to change real borrowings, just one language stuck into another.) (The template could maybe use a foreign lang tag, so {{orthographically mangled (for foreigners) version of|old_lang=de|new_lang=en|[[danke schön#]]}}; I do suspect that da and friends are used in multiple Latin-script languages, but it's a too common particle to make that easy to check.)--Prosfilaes (talk) 05:17, 5 April 2013 (UTC)[reply]

There are so many edge cases that it's hard to draw the line. Da might be meant as transliterated Russian, in which case WT:ARU disallows its existence. But I have a Hispanophone friend who sometimes says /siː ˈsɛn.nɚ/ as a joke, and si senor might be a valid English entry. I don't think you've made it crystal clear when to use this hypothetical template and when to create a normal entry, so I can't really support it yet. —Μετάknowledge^{discuss/deeds} 19:50, 7 April 2013 (UTC)[reply]

I'm confused too. What if we use the normal process of finding citations? Words from one language used in another could be labeled as such, using {{context}} or {{qualifier}}. It's dangerous if we go too far, e.g. if we start quoting all English words in Latin letters in another language, especially in non-Roman based languages. For the moment, I wouldn't go with romanised Russian either. --Anatoli ^{(обсудить}/^вклад) 02:36, 8 April 2013 (UTC)[reply]

I'm not suggesting we don't find citations; I'm worried about the stuff where we can find plentiful citations that establish it's between two languages. What I'm most concerned about is that stuff like "Do svidanya", that English speakers can find in English texts and want to look up, but is likely to get treated as Russian, and then get deleted because it's romanized. There seems to be a hole here where things that can be cited, and might actually get looked up, are deleted because they aren't real Russian or Latin, etc. I think si senor is a good example; it's not English, it's clearly Spanish or at least pseudo-Spanish. But we deleted danke schon for the same reasons, as not German. I'm not comfortable if it's created as English, it will survive. I am sure that eminently citable words and phrases like that need to be stored on Wiktionary in the spelling that people will find them used under, and what language tags them is less important then that.--Prosfilaes (talk) 06:27, 9 April 2013 (UTC)[reply]

I'm not sure I understand your suggestion. We could have redirects for commonly known foreign words if they incorrectly spelled or written in the wrong script. do svidanya -> до свидания, danke schon -> danke schön (danke schon previously failed RFD but I see in the history, it was a full entry, not a redirect). Note: schon in German is a different word from schön. I don't think si senor or si señor merit an entry, English or Spanish. konnichi wa already exists as a romaji entry and can be looked up. --Anatoli ^{(обсудить}/^вклад) 06:58, 9 April 2013 (UTC)[reply]

Template:sense

This template is used to label specific synonyms or antonyms. With antonyms that leads to problems though, like this edit shows: diff. People get confused because they expect that the sense being shown is the sense of the words listed after it. And that isn't really a strange assumption either, except that it's not how we use the template. So, would it be ok if some extra text were added to the template, so that it displays this instead: of the sense "(sense)" ? —CodeCa t 14:19, 7 April 2013 (UTC)[reply]

What is the before and after of your proposal in the general case? DCDuring TALK 14:54, 7 April 2013 (UTC)[reply]

What do you mean? —CodeCa t 16:05, 7 April 2013 (UTC)[reply]

This is a fairly well-known problem; what are you actually proposing? Mglovesfun (talk) 16:34, 7 April 2013 (UTC)[reply]

Um... I'm proposing to change the text that the template displays, like I said? —CodeCa t 17:24, 7 April 2013 (UTC)[reply]

To exactly what. DCDuring TALK 18:51, 7 April 2013 (UTC)[reply]

Quote: "So, would it be ok if some extra text were added to the template, so that it displays this instead: of the sense "(sense)" ?" —CodeCa t 18:58, 7 April 2013 (UTC)[reply]

If anything, it should display "definition" or "def". "Sense" communicates mostly to us, perhaps to linguists. DCDuring TALK 19:12, 7 April 2013 (UTC)[reply]

Status quo ante:

(a definition): word, another, more

CodeCat proposal (as I [incorrectly] understood it):

(sense) a definition: word, another, more

CodeCat proposal (from below):

(of the sense (a definition)): word, another, more

Alternative proposal 1:

(definition: "a definition"): word, another, more

Alternative proposal 2:

(def.: "a definition"): word, another, more

There are numerous other arrangements of brackets, font types, and wording possible. I don't know that any of these will solve the problem of communicating the intent of the antonym section (and the less familiar semantic relations) while simply providing a breadcrumb back to the definition. We could also try putting "NOT" in front of the gloss for the antonyms heading only or we could skip trying to communicate to ordinary users. DCDuring TALK 19:42, 7 April 2013 (UTC)[reply]

I didn't realise you wanted me to be that specific, because I felt that the spirit was more important than the letter. What I intended was for it to show as: (of the sense (sense)): word —CodeCa t 19:54, 7 April 2013 (UTC)[reply]

@CodeCat you said you wanted to change the text of the template, just not what you wanted to change it to. Mglovesfun (talk) 19:48, 7 April 2013 (UTC)[reply]

No, she said: of the sense "(sense)". (Use of <tt> or italics might have made that harder to miss, but she did say.) - -sche (discuss) 20:09, 7 April 2013 (UTC)[reply]

Corrected CodeCat proposal now above. DCDuring TALK 21:23, 7 April 2013 (UTC)[reply]

I’ve proposed:

(of “gloss here”): foo, bar, spam

in a previous discussion. — Ungoliant ^(Falai) 11:40, 9 April 2013 (UTC)[reply]

Missed that. It has the advantage of brevity over the other proposals. And it makes sense if one read from linearly from the headings to the individual items: 'Antonyms of "definition"', 'Coordinate terms of "definition"' etc. How could a user misread it? Perhaps by ignoring the quotation marks and reading the "of" as part of the following text. Should "of" also be italicized? DCDuring TALK 11:57, 9 April 2013 (UTC)[reply]

If the gloss is italicised and the ‘of’ isn’t, it will help prevent misreading. — Ungoliant ^(Falai) 12:05, 9 April 2013 (UTC)[reply]

Actually I meant to include quotes around the sense, but that kind of got list in translation. —CodeCa t 13:01, 9 April 2013 (UTC)[reply]

So some possibilities with "of" are:

(of definition):
(of "definition"):
(of definition):
(of "definition"):

Of these my favorite is the last, because: 1., we often put glosses in quotes, eg in {{term}}, 2., 'Of' needs to distinguished, 3., the whole thing needs to be visually distinct from the terms following, including any that are not links, eg SoP circumlocutions. DCDuring TALK 14:00, 9 April 2013 (UTC)[reply]

Of course, {{term}} italicises, so it may not be as distinctive as you think. Chuck Entz (talk) 14:21, 9 April 2013 (UTC)[reply]

The standard practice is to italicise mentions, but this isn't a mention, more like a quotation, so {{term}} isn't appropriate here. —CodeCa t 14:29, 9 April 2013 (UTC)[reply]

The wording "of [sense]" or "of the sense [sense]" works for 'nyms and pronunciations, but not for usage notes. I propose "in the sense '[sense]'" (or "in the sense of [sense]" or whatever), which is I think how normal people speak about a particular sense of a word. It works for 'nyms and pronunciations also: in fact, for me at least, it seems much more natural even for 'nymsand pronunciations.—msh210℠ (talk) 18:53, 9 April 2013 (UTC) ← Portions struck through at 04:45, 10 April 2013 (UTC).—msh210℠ (talk)[reply]

I was thinking of "of". Mglovesfun (talk) 22:06, 9 April 2013 (UTC)[reply]

How about allowing an alternative wording, specified, say, by an "alt=" parameter for whatever cases cases not well served by "of". There are, in English at least, relatively few uses of {{sense}} in Usage notes AFAICT. Is it commonly used there in other languages? DCDuring TALK 22:52, 9 April 2013 (UTC)[reply]

I guess my issue is partially that {{sense}} is often with not a gloss but a usage restriction or a field of endeavor as its parameter. For example, work (which currently has no 'nyms listed at all) might list 'nyms of the "Said of one's workplace (building), or one's department, or one's trade (sphere of business): He mostly works in logging, but sometimes works in carpentry" sense using {{sense|of a workplace or trade}} and 'nyms of the "(zymurgy) To cause to ferment" sense using {{sense|zymurgy}}. I've definitely seen examples of each of these types of uses of {{sense}}. Adding "of" would make no sense in those cases either. (The 'nyms aren't 'nyms of zymurgy.)

And even in the more common case, viz even when the parameter of {{sense}} is a gloss of the headword, what we're really listing aren't 'nyms of "to cause to ferment" — as the wording "of cause to ferment" (or the awkward "of to cause to ferment") would imply. Rather, what we're listing are 'nyms of work in the sense of "to cause to ferment". So adding "of" doesn't cut it, in my opinion — not even for 'nyms and pronunciations.

Perhaps best would be "for [pagename] in the sense of:" with a colon at the end and no quotation marks around what follows. Quotation marks (and even italicization if the prefatory text isn't italicized) wouldn't work in the zymurgy (or field of endeavor) case, as it'd seem like "zymurgy" is a gloss. The colon is then necessary, as "in the sense of [gloss]" doesn't flow. Using only "in the sense of:" is still slightly ambiguous, not solving the problem we started with here: it could be referring to the listed antonyms rather than the headword. I think "for [pagename] in the sense of:" takes care of all these issues — though of course there may be others I haven't thought of.—msh210℠ (talk) 04:45, 10 April 2013 (UTC)[reply]

Some small changes to Mandarin (also Cantonese, Min Nan) entry structure and about topic categories - suggestion

{{look}}

I will run this by all our active Chinese contributors but I'd like to suggest to dump the rs (radical sort) value in Chinese entries, e.g. {{cmn-noun}}.

The rationale is the following:

Finding the sorting order for the Chinese character entries is not straightforward, although Wiktionary itself is has this info. Lack of the knowledge impedes casual editors and any people who is sure about words but not sure about the structure to add new entries.
The mistakes are numerous, I have fixed some when I noticed but I'm sure I missed many.
Simplified and traditional topic categories are sorted differently but there is no real reason for it, e.g. 標準／标准 (biāozhǔn, “standard”) is sorted by "木11標準" (so will appear under "木" (tree) radical but its simplified equivalent 标准 (biāozhǔn) by "biao1zhun3" and will appear under letter "B".
A Chinese person who would rely on the radical sorting and very familiar with radicals and their order would probably be better just entering the word they are searching in Chinese and find it, rather than searching in the category listings

Take a look at this Category:cmn:Intermediate_Mandarin_in_traditional_script:

You see, a small number is sorted by a Latin letters, others are by radicals. Those under Roman letters are incorrectly formatted. Errors are often introduced when a traditional entry is created by copying a simplified entry and the initial character is different.

I suggest to remove the "rs" from entries and from category sorting and just sort by numbered pinyin (e.g. "biao1zhun3"), perhaps stop splitting topical Mandarin categories into simplified/traditional. Serbo-Croatian entries don't separate Cyrillic/Latin entries into separate categories. Or we need to check/fix all incorrectly formatted entries, for which we just don't have enough resources.

I'm not insisting on this change but User:A-cai is no longer very active here who did a great job and we could get more people on board if Mandarin entries were simpler.

Just want to check the mood and get opinions. We have tens of thousands of entries in traditional script, so there needs to be an agreement before anything happens. --Anatoli ^{(обсудить}/^вклад) 04:24, 8 April 2013 (UTC)[reply]

I have no strong opinion on this. The rs value is autogenerated when using {{cmn new}}, which relies on {{zh-sortkeys}} to produce the rs of the first character in the page title. So doesn't really bother me. (I wish the language sections are just a single template, with various parameters included, eg.

{{language_name|標|準|p1=biāo|p2=zhǔn|jy1=biu1|jy2=zeon2|poj=piau-chún|n|[[standard]]|eg=|syn=基準|syn2=|ant=}} (effectively everything needed to generate 標準),

and all the rest (trad-simp detection/conversion, pinyin analysis, sort key, even generating pinyin for character) are automated.) Wyang (talk) 05:16, 8 April 2013 (UTC)[reply]

Thanks. You're well equipped, others are not so lucky. :)

What about maintenance of topic categories. Many have been moved or deleted, just because they don't follow the structure of other languages.

Category:Mandarin terms derived from English exists on its own (35 entries), although initially was meant to be split.

Category:Mandarin terms in simplified script derived from English (356)

Category:Mandarin terms in traditional script derived from English (301)

Category:Mandarin terms derived from Japanese is now a separate category (21) but Category:Mandarin terms in simplified script derived from Japanese and Category:Mandarin terms in traditional script derived from Japanese deleted or moved (like many others, they are not empty!). It's a mess. Some long time editors like Tooironic seems to be confused about categories in Mandarin, so people just stopped categorizing Mandarin entries or categorise them at random (with or without words traditional/simplified). Well, the reason is simple - trad. and simpl. entries are sorted differently and therefore categorised differently. --Anatoli ^{(обсудить}/^вклад) 05:54, 8 April 2013 (UTC)[reply]

I do like the idea of getting rid of the duplication in categories- it always struck me as rather kludge-y. The main drawbacks/issues I can see would be characters that have multiple pronunciations, and the fact that we would instantly increase the membership of most categories and decrease the number of distinct entries per page. Also, the difference between traditional and simplified characters isn't as easy to see for those who don't know one or the other as for the difference between Latin and Cyrillic. I can see how there might be confusion about which terms in a category are traditional, simplified, or the same in both, and even which ones are paired with which. I'm sure those aren't terribly difficult to deal with, so I'm in favor of changing the category sorting.

As for the rs parameter: we wouldn't have to get rid of it. It would be easier to just make it non-mandatory and ignore it in category sorting. Maybe someday we can give users the option of choosing which sort order to use, though we'd have to populate the rs parameters by bot, first. Chuck Entz (talk) 07:18, 8 April 2013 (UTC)[reply]

Of course we should keep separate entries for simplified and traditional characters and words. Wiktionary after all aims to catalogue all words in all languages, in whatever forms. However I too support the abandoning of the old system under A-cai. It's simply not worth the extra effort. At present I add about 50 or so Mandarin entries a week. I imagine I, along with other editors, could create double the number of entries if we didn't have to deal with the rs field. But now Wyang says the rs field is generated automatically. Is that really the case? I just created a new Mandarin entry at 扇贝 - where is this automatic rs field you speak of? Did I do it wrong? If so advise me how. Cheers. ---> Tooironic (talk) 09:49, 8 April 2013 (UTC)[reply]
When you create the entry, you can use the code {{subst:cmn new/a|p1=shàn|p2=bèi|n|[[scallop]]}} in both forms, and this will generate the entire content. Wyang (talk) 12:03, 8 April 2013 (UTC)[reply]
Wow, that script is powerful. I just created 拆開 and 拆开 in seconds. Wish someone had told me about that earlier. But is the IPA on those entries correct? It doesn't look right to me... ---> Tooironic (talk) 23:09, 8 April 2013 (UTC)[reply]

@Tooironic. Re: simplified/traditional separation. With Serbo-Croatian it's easier. The words in Cyrillic and Roman sort themselves differently automatically. As you know, the parameter "t" in {{cmn-noun}} is an indicator that the noun is traditional, "s" is simplified. They are automatically added to Category:Mandarin nouns in traditional script or Category:Mandarin nouns in simplified script or both if the value is "ts". A word, which is both simplified and traditional will appear in both categories but if you just want Category:Mandarin nouns they will appear in the alphabetical order - both forms. We could apply the same sorting for both traditional and simplified noun categories but abandon trad/simp approach for topical categories? What do you think?

In a nutshell - I don't suggest removing "t", "s" and "ts" params, so SoP will always be separated into trad/simp categories as parts of speech. I suggest sorting by numbered pinyin instead of radical + number of strokes, i.e. "biao1zhun3" ("pint" parameter) instead of "rs" - "木11" for both simplified and traditional entries and remove words traditional/simplified from topical categories.

--Anatoli ^{(обсудить}/^вклад) 13:15, 8 April 2013 (UTC)[reply]

I personally don't have any issues at finding the "rs" value, it only takes a few seconds longer to create a Mandarin entry and I have to open another tab. Don't get me wrong, guys. I am just worried that most templates we use for other languages don't work for Mandarin, like for example {{etyl}}. Japanese entries also use sorting parameters (hiragana) but it's more consistent. Consider entries like 傍晚. It's adding to Category:cmn:Elementary Mandarin using "人10" as "skey" and Category:cmn:Elementary Mandarin in simplified script using "bang4wan3" as the sorting key. Why is it not categorised as a traditional version? If we treat simplified and traditional categories equally (using one sorting key) and move all topic categories to match other languages, then it would be easier for everyone. Musical instruments categories - trad/simp and without suffix all seem independent from each other - these entries ended up belonging to three topic categories, obviously using whatever sort order.

Category:cmn:Capital cities in simplified script and Category:cmn:Capital cities in traditional script don't have a common supercategory, they go directly under generic Category:Capital cities. Whatever category you take, there are problems. I stopped categorising a while ago, except for HSK, which is still OK, sort of.

Allowing a bot to load rs value may not be such a bad thing but it's probably better to normalise categorise (make them similar to other languages - no trad/simp suffixes) and use numbered pinyin or radical sort (whatever we decide) but equally for both trad and simp entries. --Anatoli ^{(обсудить}/^вклад) 13:03, 8 April 2013 (UTC)[reply]

@Tooironic. I have modified your 屌絲 and created 屌丝. With my suggested way of categorising - # {{slang|vulgar|lang=cmn|skey=diao3si1}}. Now both entries appear in Category:Mandarin slang and Category:Mandarin vulgarities sorted by chai1kai1 (under letter "D") (note categories are without words "traditional"/"simplified".

They are still in Category:Mandarin nouns in traditional script and Category:Mandarin nouns in simplified script - not suggesting to change that but we could change the sorting of the traditional term to be the same as simplified (pinyin, not rs), if we are in agreement.

Please check whoever is interested, if this is worth attention. --Anatoli ^{(обсудить}/^вклад) 00:24, 9 April 2013 (UTC)[reply]

I don't have any problem with this. I've never liked the idea of separating categories based on script types, especially two that share some characters. I wasn't even aware that some traditional terms were sorted differently. If this goes ahead, you will get my support. Jamesjiao → ^{T ◊ C} 01:41, 9 April 2013 (UTC)[reply]

Great stuff. Will invite the creator - User:A-cai. I hope he will not be upset. We could still have some bots to do tricks with automatically adding rs values to Mandarin values, right?

Wyang, you expressed suggestions how to add rs automatically but have not expressed your opinion on categories and sorting. What do you say?

The hardest bit would be converting or automating this change but as I said, Mandarin topical categories are in a mess, anyway. --Anatoli ^{(обсудить}/^вклад) 01:48, 9 April 2013 (UTC)[reply]

I think simp/trad should be merged into one single category and sorted by pinyin. Adding the pinyins everywhere would be troublesome, but like I said I would prefer if all the templates in one language section are merged into one template {{language_name|..., with various things defined by various parameters, including definitions and context labels. But I can't see this being actualisable on Wiktionary any time soon, so... Wyang (talk) 04:24, 9 April 2013 (UTC)[reply]

Both entries 動能／动能 (dòngnéng) 动能 (dòngnéng) belong to Category:cmn:Physics (not in ~~Category:cmn:Physics in simplified script~~ or ~~Category:cmn:Physics in traditional script~~!) and are sorted by "dong4neng2", so appearing under letter "D", not under radical "力". If everyone is OK with this, I will update Wiktionary:About Sinitic languages. All entries in Mandarin categories with "...in simplified script" and "...in traditional script" should gradually be moved to categories without these suffixes, with the numbered pinyin sort order e.g. skey=dong4neng2 or just by adding |dong4neng2 in the category name, e.g. [[Category:cmn:Physics|dong4neng2]]

It's a lot of work and I am currently busy with other things but will get to this eventually.

Parts of speech categories remain as they are for now, with the traditional/simplified distinction. We could change the sorting key for traditional entries to use pint rather than rs but I don't how. Simplified entries are sorted by pinyin. --Anatoli ^{(обсудить}/^вклад) 00:17, 11 April 2013 (UTC)[reply]

Sorry for not responding sooner. I haven't had as much time to devote to the project in recent years. I'm all for automation and making things easier in order to make the site attractive to more contributors. No objections to your modification proposals. -- A-cai (talk) 17:49, 27 April 2013 (UTC)[reply]

En dash in {was wotd}?

Per user request at Template talk:was wotd#request to exchange hyphen for en dash, is it ok to change the hyphen “-” for an en dash “–” in {{was wotd}}?

This is a v. minor change, but it’s highly visible, so I thought it best to ask.

—Nils von Barth (nbarth) (talk) 12:24, 8 April 2013 (UTC)[reply]

I support. Good thing you asked, as some editors seem to really hate the use of typographic characters instead of plain ASCII ones. — Ungoliant ^(Falai) 12:31, 8 April 2013 (UTC)[reply]

The stated justification is typographical correctness. Really? DCDuring TALK 21:45, 8 April 2013 (UTC)[reply]

I would like to know what kind of person (other than a trained Wikipedia pedant) actually writes Bose–Einstein condensate rather than Bose-Einstein condensate. Equinox ◑ 21:47, 8 April 2013 (UTC)[reply]

Writes? I don't think anyone uses a hyphen-minus in writing. People type it, but typesetters (who have nothing to do with Wikipedia) have always had to choose the correct dash-type character from the type tray or now character set. Pick up any properly typeset book, and you will find that Bose–Einstein condensate is typeset with an en dash.--Prosfilaes (talk) 01:35, 9 April 2013 (UTC)[reply]

I support.—msh210℠ (talk) 18:55, 9 April 2013 (UTC)[reply]

It’s a weird world where we set type and publish it to the world on a typewriter keyboard. —Michael Z. 2013-04-09 02:58 z

DC, regarding typographical correctness: yes, an en dash is correct here, while a hyphen is incorrect; see hyphen and dash. The hyphen is reserved for intraword usage, such as line-wrapping and compounds (such as line-wrapping ;), while en dash is used in varied contexts, including interword use such as this. See Wikipedia:Manual of Style: Hyphens for usage at ’pedia.

Beyond correctness, there’s also aesthetics – a hyphen jumps out at me here as conspicuously too short (it’s sized for intraword use, and thus feels stubby surrounded by spaces), which is the standard typographical judgment.

The main objections to use of non-typewriter typographical characters I’ve heard are:

Rendering problems – non-ASCII characters render poorly on some computers, particularly older ones.
Input or editing difficulties – some editors have difficulty entering non-ASCII characters (due to needing to use a character picker) or editing entries with non-ASCII characters (esp. due to rendering issues).
Personal preference – some users prefer typewriter characters over book-style typographical characters.

Use of typewriter characters is naturally common online, due to ease of input, though we needn’t be limited by it. In the case of templates (as opposed to use in entries), there aren’t any editing difficulties, and we have lots of Unicode throughout Wiktionary, so I don’t think there are significant problems, but want to check.

Sounds like people are generally supportive (or “meh”); will wait another few days for more comments.

—Nils von Barth (nbarth) (talk) 15:38, 9 April 2013 (UTC)[reply]

Just go ahead and change it. Why on earth start a discussion about using the correct character in a template, where it will never have to be re-entered?

Rendering problems – this keeps getting mentioned, but really? Give me a break! Netscape Navigator 4 had Unicode support. If you’re reading a dictionary site with “over 500 languages” on a pre-1997 browser, maybe a dash out of place won’t ruin your day. —Michael Z. 2013-04-09 21:51 z

There being no opposition, I have gone and dunnit. —Michael Z. 2013-04-09 22:02 z

Thanks Michael!

—Nils von Barth (nbarth) (talk) 14:35, 15 April 2013 (UTC)[reply]

Facebook

I set up this page on Facebook for promoting Wiktionary of all languages. You are welcome to become co-administrators of the page, so you can update the page with inspiring messages. --LA2 (talk) 20:55, 8 April 2013 (UTC)[reply]

Where do I apply? — Ungoliant ^(Falai) 21:29, 8 April 2013 (UTC)[reply]

I noisily hate social meeja and would prefer us to "promote" ourselves through just making a good dictionary that people want to use. But I suppose it can't hurt :) Equinox ◑ 21:34, 8 April 2013 (UTC)[reply]

I am boycotting Facebook, but why not promote Wiktionary there? DCDuring TALK 21:44, 8 April 2013 (UTC)[reply]

I'm using that page to pull people out of Facebook and into Wiktionary. Whether you boycott Facebook doesn't matter, since you are already here. However, if someone would like to help to pick a "word of the day" for the Facebook page, I think that could make the page quite popular. --LA2 (talk) 22:37, 8 April 2013 (UTC)[reply]

From what I know about Facebook, that's not going to happen. Facebook is all about making more Facebook... —CodeCa t 22:48, 8 April 2013 (UTC)[reply]

That's not the first page on Wiktionary in Facebook. Earlier this one was advertised. I liked both. Don't see why not. Would also be useful if we could recruit some native speakers and talented editors but promoting among users is also important, Wiktionary is for users, not for editors :) --Anatoli ^{(обсудить}/^вклад) 00:57, 9 April 2013 (UTC)[reply]

I really don't understand the amount of hate here for Facebook. Use it wisely and use it to your advantage. Don't post info that you don't want others to see.... Simple... I will take a look at the page on my home btw. I liken this attitude to the one on StackExchange towards Wiktionary. Take a look at this: How much should I trust Wiktionary?. I tried to defend Wiktionary and provide my own arguments (thanks Hippietrail for chiming in), but I can't change everyone's mind I guess. Jamesjiao → ^{T ◊ C} 01:56, 9 April 2013 (UTC)[reply]

It is correct that in a Beer parlour discussion in March 2012, the existing Facebook page was mentioned, but that is a placeholder page that Facebook created based on a Wikipedia entry. That page doesn't get updated and there is no way to claim it, it's a dead end. The page I created now has a dozen co-administrators that are able to update the page and appoint more co-administrators. It's an anarchy of the same kind as the Wikisource page on Facebook, that I set up last year. It gets updated sometimes, but not very often. Right now, the Wikisource page has 418 fans and Wiktionary has 69. --LA2 (talk) 13:58, 9 April 2013 (UTC)[reply]

69? That's a good position to be in. Mglovesfun (talk) 22:04, 9 April 2013 (UTC)[reply]

groans loudly* OK, seriously, Facebook pages have some sort of automated system in which you can write a bunch of posts and they'll come out on a schedule. Assuming somebody's willing to put some time in, we could easily have posts and to spare. —Μετάknowledge^{discuss/deeds} 15:11, 13 April 2013 (UTC)[reply]

WE could have a Facebook widget on our Front Page, that users could click on. I think the code is something like <a title="Tell Facebook" href="http://www.facebook.com/sharer.php?u=http://en.wiktionary.org/;t=Wiktionary">Facebook</a> SemperBlotto (talk) 15:20, 13 April 2013 (UTC)[reply]

Proposal of a pronunciation recording tool

Hello, Rahul21, a developer, offers to develop a pronunciation recording tool for Wiktionary, helped by Michael Dale as part of GSoC. The tool would allow to record and add audio pronunciations to Wiktionary entries while browsing them (see background discussion on Wiktionary-l). Please read and comment the proposal! Regards, Nemo 22:37, 9 April 2013 (UTC)[reply]

A slightly different way to show etymologies derived from Latin verbs

Romance languages use the infinitive as the lemma, but for Latin we use the 1st person singular present. This means we can't write "from Latin cantō" in any of the etymologies at cantar, because the infinitive derives from cantāre. Most entries solve this by just saying "from Latin cantāre, present active infinitive of cantō". But that is rather wordy, moreso than what's really needed to get the point across: the word cantar derives from cantāre, but its Latin lemma/paradigm entry is at cantō. For that reason I've started to use another approach, by writing {{term|canto|cantāre|lang=la}}. So it will show "cantāre", but link to canto. Since not many entries have this, I wondered if nobody had considered doing it that way yet, so I'm sharing the idea here. :) —CodeCa t 02:22, 11 April 2013 (UTC)[reply]

That’s a good idea. — Ungoliant ^(Falai) 02:37, 11 April 2013 (UTC)[reply]

I'd done that and waited for someone to complain about it. The case that CodeCat mentions seems ideal for that approach. What about derivations from participle forms? DCDuring TALK 03:30, 11 April 2013 (UTC)[reply]

Participles are considered separate lemmas as far as I know. They have their own declension tables too. —CodeCa t 12:32, 11 April 2013 (UTC)[reply]

I've done this for months :) —Μετάknowledge^{discuss/deeds} 15:00, 13 April 2013 (UTC)[reply]

I just do "from Latin cantō", exactly as you say we "can't". (I guess I've found a way! :-P) The French verb chanter really does come from the Latin verb cantō, so it's straightforward and correct. It's only a problem when people try to gloss cantō as "I sing" (as though they were glossing the specific form) instead of the correct "to sing" (which is how we gloss verbs). —Ruakh_TALK 16:39, 14 April 2013 (UTC)[reply]

I like CodeCat's suggestion. Also, had I in the past noticed any entry glossing "canto" as "to sing" rather than "I sing", I would have changed it and (though this discussion informs me not to do so) I would have marked the edit as minor, assuming I was uncontroversially correcting a simple error by a random IP unfamiliar with Latin grammar. - -sche (discuss) 00:10, 15 April 2013 (UTC)[reply]

Another issue is the descendant section of Latin verbs. Should, say, video’s descendants be linked to as {{l/pt|ver}} or {{l/pt|ver|vejo}}? — Ungoliant ^(Falai) 01:02, 15 April 2013 (UTC)[reply]

I have no problem with the concept, but you need make another template for this purpose instead of overloading the meaning of {{term}}'s parameters. To anyone not familiar with this usage, it's confusing and looks like an error. I've probably accidentally "corrected" one of these before so that the macron and no-macron versions matched. Pengo (talk) 02:12, 7 May 2013 (UTC)[reply]

Why would we need another template to do effectively the same thing? The template doesn't mandate that the linked entry and the displayed term are in any way "the same", and as far as I know other people have been doing this for a long time. For example, you sometimes see definitions like this: # [[break|broken]]. I don't see anything wrong with that in principle. —CodeCa t 02:20, 7 May 2013 (UTC)[reply]

Appendix:1000 Japanese basic words

This may not be appropriate for the BP but since this is the most visible spot, I want to ask everyone their opinion about Appendix:1000 Japanese basic words and what to do with it. (I wrote something on the talk page too.) It's a good appendix now, but it's "1000 Japanese basic words" and the description is "This appendix is a specific list of one thousand basic words," and yet there are about 700 words in it.

Some background: I don't know the full story but as far as I can tell, in a nutshell, the Japanese Wiktionary was building the list ja:Wiktionary:日本語の基本語彙1000 some time ago, and the editors here decided to copy it. At the time the original list was incomplete. Since then, the original list has grown but en.WT's list has not been maintained. Now, ja.WT's list has surpassed 1000 words and their list says "作業中現在:989項目 2008年11月16日一旦、1,000以上挙げ、その後取捨選択するなり基本語彙2,000にタイトルを変更するなりする方針としたいと思います。" which means that their list broke 1000 entries and that they are considering changing the name to 2000 basic words.

We can go two routes: depart from ja.WT and keep it a list of 1000 basic words, or mirror their version, and exceed 1000 words in the process.

I don't have exact numbers, but if you search for "Japanese word list" on Google, our appendix is the first result. That suggests to me that the wider world is making use of it as a resource. While ja.WT's version is good, it lacks essential words such as 可愛い (kawaii), いっぱい (ippai, "very"), たくさん (takusan, "many"), or すごい　(sugoi, "very/wow!".) You can't have a 30-second conversation with high school students without using those words. Conversely, ja.WT's appendix has quite specific words such as ミミズ (mimizu, "earthworm") and 十二指腸 (jūnishichō, duodenum). Duodenum is a basic word?

How about both routes? I would like to combine the most basic of the "basic" words and the Japanese Language Proficiency Test Level 5 appendix (the lowest level) for a "1000 basic Japanese words" appendix, and maybe mirror ja.WT's appendix on a different page. --Haplology (talk) 05:02, 12 April 2013 (UTC)[reply]

Your last paragraph sounds eminently reasonable, and I fully support that method (although I think perhaps mirroring ja.wikt's appendix is less important, because it would appear that we are a better arbiter of basicness than they are). —Μετάknowledge^{discuss/deeds} 14:59, 13 April 2013 (UTC)[reply]

This appendix is not a very scientific one and was made by amateurs. It's worth adding words to make a thousand, choosing carefully from JLPT or frequency list and/or removing that are identified as not being basic.

The valuable time could be spent on making Appendix:JLPT better - fixing the word format and choosing the spelling we actually have here, e.g. we have 上がる but not 上る, or create the alternative spellings.

JLPT appendices could be made similar to Appendix:HSK list of Mandarin words with new categories like ~~Category:JLPT/N5~~ Category:ja:JLPT-5 or similar. --Anatoli ^{(обсудить}/^вклад) 01:53, 15 April 2013 (UTC)[reply]

I'm glad we all agree. I've been adding common words from the N5 list to the category, and once the category reaches 1000 items, I plan to add them to the appendix and add the sort keys to the categories. I've been though the whole N5 list once and added common words at my discretion (but not all of them,) and there are now almost 900 words in the category. I plan to go through N5 again, and also look at the N4 list and try to find any other essential words that may have been missed. The original list is biased toward nouns, so other parts of speech would be good places to look for new candidates. It also ignores casual words like ちゃう, which is also essential to high school students, or pretty much anybody. To anyone who is so inclined, if you see anything that strikes you as essential in the real world, then please add it. --Haplology (talk) 05:42, 17 April 2013 (UTC)[reply]

I have just created new categories. What I meant is something like this: Category:Japanese by difficulty level with five categories. I only added two words as examples: 会う to level 5 (Category:ja:JLPT-5) and 安心 to level 4 (Category:ja:JLPT-4). The actual names of categories and templates, format and links can be discussed. The HSK categories provide a bit more info and look better. Please take a look. --Anatoli ^{(обсудить}/^вклад) 06:09, 17 April 2013 (UTC)[reply]

Sure, that sounds good. I just have a few questions. So basically this means completing the JLPT appendices project, as well as the 1000 basic words project, and having both exist in parallel? That's what I would hope for, as both projects have already been made, and they serve slightly different purposes. I assume that no new words would be added to the JLPT categories, only the ones already on the appendices? In the process of reviewing the appendices, it sounds like you want some revision to be done to them, such as adding more common forms like 上がる rather than 上る. I agree with that. I just changed "掃除　そうじする to clean" to "掃除そうじ cleaning", but perhaps "掃除するそうじする to clean" would be better, and have that link to 掃除? I think there is also 近く, so what should be done with that? In the past there was some opposition to creating pages like　近く, but I think there's precedent for pages like that in other languages and there's no policy against them. It's mainly just that the Japanese editors have enough work with lemmas, and if there are going to be forms like　近く with their own entries, I'd rather a bot add them. The L5 appendix was a bit slow to edit, but did not time out or have any problems like that, so I guess there's no need to break it up like L1 (which was too much for the server to display.) What do you think about breaking up appendices? --Haplology (talk) 04:28, 18 April 2013 (UTC)[reply]

Yes, I think both templates and category groups could easily coexist.

する-verbs, I'd link to lemma but display lemma + する because they are verbs. Having "掃除 to clean" would look weird because 掃除 is a noun. I have adopted this for translations. Same thing for な-adjectives.

Cleaning sounds good but I don't know if JLPT would prescribe 上る for the tests, not 上がる. JLPT is a bit more strict in nature than 1000 basic words but I have no idea who made original lists, how accurate and up-to-date they are. Should students for level 5 know both forms? We can always have simple entries with links to main entries, even skipping conjugations, etc. to save time. What do you think?

No strong opinion on 近く but since く-adverbs are simple in structure, I don't see why we should discourage them, also for the sake of back translations from English. No need to create them, if a bot could do it but I wouldn't delete if they exist.

Breaking up appendices - OK. You already did one. --Anatoli ^{(обсудить}/^вклад) 04:53, 18 April 2013 (UTC)[reply]

tt

A lot of editors are used to typing <tt> to make things look typewritery. In HTML5, tt is “entirely obsolete, and must not be used by authors.”[2] The W3C suggests:

Where the tt element would have been used for marking up keyboard input, consider the kbd element; for variables, consider the var element; for computer code, consider the code element; and for computer output, consider the samp element.

It looks to me like code is a good general replacement. More specific semantics can be conveyed with samp, kbd, and var. Continuing to use tt in discussions won’t break anything, but we should replace it in templates and entries, so we don’t have to endure the shame of unnecessary validation errors after the MediaWiki software is brought up to par. —Michael Z. 2013-04-12 17:51 z

By the way, also gone the way of the rotary dial are acronym, big, center, font, strike, and u, and all of those styling attributes on table elements. —Michael Z. 2013-04-12 17:59 z

What does "obsolete" mean in HTML-world? I went to an HTML class today, and we were using some of these (well, definitely font) without any indication that they could ever be a problem. —Μετάknowledge^{discuss/deeds} 03:39, 14 April 2013 (UTC)[reply]

Font? Ouch – I should have a word with your teacher.

During the 1990s’ browser wars, every browser was making up new features and displaying them differently, and web development was a fragmented nightmare. Since then, the W3C approves the official open standards that make up the web based on feedback from browser developers, and we can mostly write HTML for one standard instead of for five current and twenty-seven past browsers (but don’t get me started on MSIE 6). The wide adoption of CSS, which allows for the separation of presentation from document structure, has led to newer versions of HTML deprecating and obsoleting purely presentational elements.[3] Unfortunately, the nature of wikitext encourages editors to include lots of presentation guff repeated many times in every page, but this is bad practice because it bloats pages and makes maintenance difficult. Like templates, style sheets let us centralize presentation and reduce page bloat.</pedantry>

Browsers are built for backwards-compatibility, so most of the old elements will still work. But as an organization for openness, we should follow the recommendations of current open standards, and certainly abandon practices deprecated in the last century.

Specifically, HTML 4.01 (1999) deprecated center, font, s, strike, and u, and others.[4] HTML5, which MediaWiki is now specifying in the doctype at the top of every HTML page, has obsoleted these and other elements and attributes,[5] and redefined some others.[6] —Michael Z. 2013-04-14 15:28 z

Thanks for that explanation. Specifically, my teacher recommended using CSS (which I'm learning now), but said that for basic formatting, just using the HTML tags is fine (although it may not be much faster than inline CSS). I agree with replacing them in templates but not giving a damn on discussion pages. —Μετάknowledge^{discuss/deeds} 04:44, 15 April 2013 (UTC)[reply]

Agreed, in principle. But I suggest you keep in the mindset that you are structuring HTML, not formatting as one does in MS Word, and the presentation is created by the browser’s or website’s default style sheet. —Michael Z. 2013-04-15 14:53 z

100 million edits

According to our sources, the 100 millionth edit was made to Wiktionary (all languages taken together, humans and bots included) during Friday April 12. Congratulations to us all! About 20% of the edits have gone into the English Wiktionary. --LA2 (talk) 02:05, 13 April 2013 (UTC)[reply]

I wonder which was the 100 millionth edit. — Ungoliant ^(Falai) 03:20, 13 April 2013 (UTC)[reply]

Probably me changing a shitty em dash to a beautifully appropriate en dash. —Michael Z. 2013-04-13 06:15 z

Foreign word of the day: reconstructed terms, constructed terms and name.

In the vote for creating the FWOTD feature, the points “eligibility of reconstructed languages” and “eligibility of constructed languages” didn’t achieve consensus (except conlangs which don’t meet CFI, which failed) by the end of the vote.

Also, we’ve had a few of people complain about the name “Foreign word of the day,” so if anyone wants to suggest a change feel free to do so.

Summarising, I’m consulting the community on:

whether terms in reconstructed languages (Proto-Indo-European, Vulgar Latin, Proto-Germanic, etc.) should be allowed to be foreign words of the day;
whether terms in constructed languages that meet CFI (Esperanto, Ido, Lojban, etc.) should be allowed to be foreign words of the day;
whether the feature’s name should be changed.

— Ungoliant ^(Falai) 14:18, 13 April 2013 (UTC)[reply]

I support the eligibility of reconstructed languages, because they are some of our most interesting content. Naturally, for reconstructed terms we shouldn’t require pronunciation and should require a reference from a trustworthy source instead of citations.

I support the eligibility of constructed languages that meet CFI. Don’t see why not.

I oppose changing the name. I don’t find it offensive in any way whatsoever.

— Ungoliant ^(Falai) 14:18, 13 April 2013 (UTC)[reply]

I support the first two, and I kind of oppose the third because I don't see anything wrong with the current name. In Dutch, there is a nice word anderstalig, but I don't know if English has an equivalent word. Maybe that would be a good word to feature? :) —CodeCa t 14:43, 13 April 2013 (UTC)[reply]

I oppose the eligibility of reconstructed languages since they are by definition uncitable. That's why they're not in mainspace, too. I support the eligibility of constructed languages that meet CFI. I abstain on the issue of the name; I don't understand what could be offensive about it, though I can see it might be misleading, but I can't think of a better name besides "non-English word of the day" which sounds dumb. Incidentally, although you didn't ask, I also oppose allowing mentions rather than uses to count as cites in FWOTD nominations. I know that mentions are good enough for RFV when it comes to LDLs, but I think FWOTD ought to have higher standards than RFV/CFI. Note that FWOTD already requires pronunciations, even though nothing at CFI requires them. —Angr 14:51, 13 April 2013 (UTC)[reply]
- While I sympathise with your point, this would make it much harder to feature words from languages without contributors who speak them, like Kaingang and Quechua, and it’s already Indo-European dominated enough as it is. — Ungoliant ^(Falai) 15:16, 13 April 2013 (UTC)[reply]
  - The trouble with allowing a single mention is that there's no protection against errors. If the single source we use for Kaingang or Quechua has a fictitious entry (whether deliberate or accidental) or even just a typo, then we are at risk of propagating that error if we don't confirm it elsewhere. Bad enough when that happens in any entry, but worse when it happens in an entry being featured on the main page. —An gr 17:08, 13 April 2013 (UTC)[reply]

I vote per Ungoliant, although I also support the eligibility of terms in conlangs, ~~which Ungoliant took no stance on~~. —Μετάknowledge^{discuss/deeds} 14:54, 13 April 2013 (UTC)[reply]

I did. — Ungoliant ^(Falai) 15:16, 13 April 2013 (UTC)[reply]

Sorry. Rectified above. —Μετάknowledge^{discuss/deeds} 03:37, 14 April 2013 (UTC)[reply]

I vote per Angr. I'm undecided on whether the name needs to change; we don't have a great alternative, but I do understand why people might want to change it.--Prosfilaes (talk) 19:56, 13 April 2013 (UTC)[reply]

Not much point in voting against a title if there is no clear proposal for a replacement.

What exactly were the complaints against “foreign?” It’s not exactly offensive, but kind of ignorant when it’s a minority of English speakers who live in countries where other languages are truly foreign. Calling French a foreign language in Canada, for example, is incorrect and at least off-putting to a francophone Quebecker who accepts his or her first or only language for granted as native.

What alternatives are there?

foreign-language word of the day
non-English word of the day
other-language word of the day
alterlingual word of the day (is there a real Latinate word?)
alloglossal word of the day (ditto Greek?)
interlingual word of the day
international word of the day
global word of the day
world word of the day
exotic word of the day
other word of the day

—Michael Z. 2013-04-13 19:21 z[updated list —Michael Z. 2013-04-14 14:40 z]

But suppose you are an anglophone Canadian who learned French. If someone asks you “do you speak any foreign language?”, isn’t “French” a correct answer? — Ungoliant ^(Falai) 19:45, 13 April 2013 (UTC)[reply]

No? I would regard it as sloppy usage of the word "foreign" = from a different country. In any case, suppose you are a francophone Frenchman; why would French be foreign?--Prosfilaes (talk) 19:50, 13 April 2013 (UTC)[reply]

Well, foreign also means “from a different language,” and many Canadians live with only one of the official languages, which is why such misunderstandings can happen.

If you are from France though, wouldn’t you understand what a “foreign word” is in the English-language Wiktionary? So, can anyone link to some complaints about the title, so we can replace speculation with evidence? —Michael Z. 2013-04-14 01:35 z

[7], [8], [9], possibly [10]. — Ungoliant ^(Falai) 01:46, 14 April 2013 (UTC)[reply]

"From a different language" is not listed as a definition at foreign, and it doesn't sit right with me when ASL or Native American languages get lumped in as foreign languages, though the lack of a better term often means they do. At Distributed Proofreaders, we got in a habit of using "languages other than English (LOTE)", precisely because they weren't foreign to our site or users.--Prosfilaes (talk) 10:40, 14 April 2013 (UTC)[reply]

The readers’ feedback is convincing. I support changing the name FWOTD to anything else. —Michael Z. 2013-04-14 14:40 z

Of the four feedback comments linked to above, three explicitly recommend "non-English", so if we're going to discuss a new name, I guess that's the primary contender. —An gr 16:21, 14 April 2013 (UTC)[reply]

Like you, I think it sounds dumb. The best of Mzajac’s suggestions is “foreign-language word of the day,” though it might still offend people and I still oppose change. — Ungoliant ^(Falai) 17:34, 14 April 2013 (UTC)[reply]

Can we agree to enhance the name by moving WT:Foreign Word of the Day to WT:Foreign-Language Word of the Day? We can get used to that in a month or two and see if it still raises readers’ ire. And reconsider renaming if it appears warranted later? —Michael Z. 2013-04-14 17:47 z

I seriously doubt anyone who objects to "Foreign Word of the Day" will be content with "Foreign-Language Word of the Day". —An gr 19:46, 14 April 2013 (UTC)[reply]

I support "non-English"; "foreign-language" strikes me as having pretty much all the problems "foreign" does.--Prosfilaes (talk) 07:10, 16 April 2013 (UTC)[reply]

Regarding the entry foreign, defintion 2, example "eating with chopsticks was a foreign concept to him": Certainly, this use of "foreign" is not restricted to other cultures? Things can be "a foreign concept" to a person that has never met that idea before. I think good synonyms are "unfamiliar, unknown, strange", and that these should be added to the explanation. But English is not my native tongue. --LA2 (talk) 23:43, 14 April 2013 (UTC)[reply]

Increasing default font-size

I proposed this a couple of weeks ago, and had little feedback. Not sure if everyone doesn’t care or just didn’t notice. So I’m posting this reminder, and will change the site’s default font-size, shortly. —Michael Z. 2013-04-14 02:02 z

It looks perfectly readable to me so I see no reason to change it. Why do you think it's too small? —CodeCa t 03:04, 14 April 2013 (UTC)[reply]

As I wrote in the original post, editors have used Common.css to enlarge the font for 54 languages and scripts, affecting thousands of entries. The discrepancies bug me. —Michael Z. 2013-04-14 05:14 z

It is odd to me that the existing "default" font size for the site would not be the default for the user's browser, i.e. not medium. But Web designers seem to work upon contrarian principles of their own. Bigger is fine by me, but I hope it can be set to browser default rather than a hard-coded "what looks good on this year's monitors". Equinox ◑ 03:32, 14 April 2013 (UTC)[reply]

I did the math. Browser default.. For a preview, copy the first bits from my vector.css. —Michael Z. 2013-04-14 05:19 z

I will update MediaWiki:Vector.css within the hour. Complaints welcome. —Michael Z. 2013-04-14 15:34 z

Done.[11] Force-reload to update the style sheet immediately. —Michael Z. 2013-04-14 15:46 z

I rolled back your edit; it looks terrible to me and there was not sufficient consensus IMO. —Μετάknowledge^{discuss/deeds} 15:54, 14 April 2013 (UTC)[reply]

I thought two BP discussions with no opposition would constitute consensus to try out a harmless improvement. Your single subjective opinion after a one-minute look at a major visual change doesn’t constitute any consensus or evidence either. Thanks for speaking for everybody. —Michael Z. 2013-04-14 16:18 z

I'll be opting out anyway. I didn't like it at all. Mglovesfun (talk) 16:00, 14 April 2013 (UTC)[reply]

Could someone actually respond to the evidence I have cited, instead of blowing away a major change based on “I don’t like it,” without even using it? —Michael Z. 2013-04-14 16:19 z

Sorry, I don't see anything that I would call "evidence". In the previous discussion you gave a list of putative advantages, but seemingly no "evidence" for them. (Perhaps you and I define the term differently?) At any rate, if you want people to reply to something specific, please indicate what. In particular, if you could highlight some part of your argument that would justify increasing the font size even if no one liked the result, that would certainly be interesting! —Ruakh_TALK 16:45, 14 April 2013 (UTC)[reply]

The biggest objective evidence that our font-size is small is that other editors have been increasing it, to the tune of over 50 CSS declarations in our style sheet, the majority setting the font-size to the browser default. No one has mentioned any disadvantage of setting the font-size to the browser default.

I’ve put in significant time doing research and testing, tried to outline my reasoning, and did my best to get feedback. Not one objection was made. Now, could someone here at least do me the courtesy of actually trying to use this for an hour or a day, instead of taking one glance, blurting out “I don’t like it” because it is different, and blowing off my effort completely? —Michael Z. 2013-04-14 17:00 z

Re: "could someone here at least do me the courtesy of actually trying to use this for an hour or a day": http://en.wiktionary.org/wiki/User:Ruakh/common.css?diff=20160735. —Ruakh_TALK 17:23, 14 April 2013 (UTC)[reply]

Thank you for that. Sorry to get cranky. I included a list of what I see as concrete advantages in my original proposal. I think things can be improved, and I would appreciate critical feedback. —Michael Z. 2013-04-14 17:50 z

I've been trying out the larger size for the past several days. While it's more legible, there are other drawbacks. While this larger size may correspond to the "de jure" default browser size, it doesn't correspond to the "de facto" default size for web pages. Almost every other text-based website I look at has smaller text, much closer to the "traditional" Vector size. People get used to one font size on webpages and when they encounter something noticeably smaller or (as in the proposed new Vector size) much larger, it looks absurd. And more urgently, if we change the default Vector size here at English Wiktionary we're out of sync with every other Wikimedia project's Vector skin. I know that perfect unity isn't possible across languages, but at least every English-language project's Vector should look like every other English-language project's Vector. If I'm looking at Wikipedia, then at Wikisource, then at Commons, and then at Wiktionary, it's startling when Wiktionary's text is so much larger than every one else's. And if I didn't know that it's that way because I deliberately set it that way on my own CSS page, I would be baffled and put off by it. —An gr 21:31, 17 April 2013 (UTC)[reply]

Some good points I hadn’t considered in detail.

WikiMedia branding. Indeed, most WikiMedia projects use 13px font size. I see that zh and ja Wiktionaries use 15px, Arabic, Pashto and Farsi 14px. However, explicit branding elements in the other projects vary a lot. Among Wiktionaries, even the site logos (!), home-page layout, use of tone and colour, icons, etc., vary wildly. The only thing all these sites have in common is the basic MediaWiki interface with grey and white background and blue rules. Also, the favicon is identical on all but cs. and en.Wiktionary. Choosing font-size for branding over readability would be poor prioritizing, when it would make an insignificant difference in the visual identity, but potentially a large one in readability. If we value our uniform branding at all, why don’t we coordinate site design, or unify even the most basic branding elements before compromising readability?
The appearance of credibility. It’s true that 13px may be the the most popular font-size,[12] but that isn’t a “de facto default” in any sense I can think of, nor does being widely used make it the best choice for anything specific.[13] A website doesn’t look smart or credible by picking the most popular font size for no other reason. It does it by considering the factors that font size affects, and choosing an appropriate size for the particular site. Increasing font size for over 50 languages while sticking to a13px default looks “absurd” to me.
Readability. As you say, a larger font than 13px is more legible. This is particularly true on both the extra-small and extra-large screens that more readers are using these days. Still more so for many of the language scripts we use, as we have concretely demonstrated in our style sheet.
I will also add that we have to overcome serious readability problems inherent in Vector, like the fact that text columns can be ridiculously long for readers who do not resize their window.[14] And 13px is not our smallest font size. —Michael Z. 2013-04-21 01:14 z
Accessibility. Overlaps with the above, but it should be mentioned that many of the designers of the average 13px websites have good eyes, good displays, and are poorly schooled in accessibility and internationalization. Many of these “average” websites are aimed at youthful or moneyed markets. Ours is the broadest possible audience, including non-native readers, aging, vision-impaired, impoverished, having only mobile internet access, etc. Failing to optimize readability harms segments of our audience that many other websites ignore.

I still think any disadvantages of increasing font size are minor at worst, and far outweighed by the concrete benefits. —Michael Z. 2013-04-21 00:52 z

Wikimedia projects' appearance varies widely from language to language, but not so much from project to project within a single language. When I had my Wiktionary font size set larger, I found it genuinely distracting to go from Wikipedia to Wiktionary because of the font size difference. It makes you notice the print rather than the content, which is a sign of poor typography. As for column width, as the page you linked to says, the solution is to define columns as being a certain number of ems wide, not to force the text to appear larger. But that too is something that ought to be done to all (English-language) projects' Vector skins, not just Wiktionary. —An gr 22:02, 23 April 2013 (UTC)[reply]

Tracking category for missing inflected forms

Feel free to let me know if there is a better way of doing this already in place, but an idea struck me recently upon seeing red links in inflection lines. I think that we should have a system to track these links, since they are either valid missing entries for inflected forms of lemma entries or incorrect inflections being displayed on entries (for example, words lacking plurals or different feminine forms where the editor has not changed the template's default behavior). In both cases, they should be actively dealt with, either by creating pages for missing inflected forms or correcting the inflection templates. This seems like low-hanging fruit, since it is simple work and a motivated editor could do dozens of these in a sitting, or far more with acceleration. It would be relatively simple to use the ifexist parser function so that pages with red links in their inflection templates are put in a maintenance category recording that, so that editors can come along and address them.

As an example of what I am talking about, I made an edit to {{es-adj}}, so that it now puts entries with red-linked feminine singular forms in inflection templates into Category:Missing Spanish feminine adjectives. Have a look at that category to see what I mean. There are 884 of these (as of now) being detected, which means potentially 884 missing entries just in looking at Spanish singular feminine adjective forms alone. Ideally, I think this kind of category could be useful across all of the inflection templates and all of the inflected forms they output, but I wanted to raise the idea here for comment. We may want to have broader categories than the "Missing Spanish feminine adjectives" one I created; maybe all entries with missing inflected forms should go in a single big maintenance category. Is this a useful idea? Dominic·t 07:02, 16 April 2013 (UTC)[reply]

There is one major difficulty with that. To check whether a given page exists is considered "expensive" by the MediaWiki software, and we're limited to about 100 of those checks per page. Once a page reaches that limit, any remaining checks will return "does not exist". So, we can't use this too much on pages because there is a danger that it will break the page if overused. —CodeCa t 12:43, 16 April 2013 (UTC)[reply]

Agreed. This is a bot job; we just need to convince somebody like SB to take it on. —Μετάknowledge^{discuss/deeds} 13:55, 16 April 2013 (UTC)[reply]

I would be more afraid of false positives from people not changing the inflection template defaults if we just created them all at once than I would be of pages which will hit the limit of parser function checks from adding this new one. Do we have any reason to think there would be many, or any, pages that would break? I am fairly sure the limit is actually 500 calls, not 100. That's a lot of inflection templates for one page. Also, once the limit is reached, it does not make the functions return false, creating false positives. It actually just refuses to expand the templates after the limit. Dominic·t 14:53, 16 April 2013 (UTC)[reply]

암글

Can somebody delete 암글--wasn't sure how/where to ask? King jakob c 2 (talk) 20:47, 16 April 2013 (UTC)[reply]

Done. Thanks. Adding the {{delete}} template is enough. — Ungoliant ^(Falai) 20:58, 16 April 2013 (UTC)[reply]

Template term and lang parameter

I oppose template {{term}} requiring the "lang=" parameter, showing "???" before the term if the lang parameter is not provided. This change seems to have been introduced to the template today or yesterday by CodeCat (talk • contribs). An example of use of template "term" without lang parameter: (deprecated template usage) physics. --Dan Polansky (talk) 08:07, 20 April 2013 (UTC)[reply]

Something like this seems to have been discussed at Template_talk:term#lang. People should not use such obscure pages to discuss significant changes! --Dan Polansky (talk) 08:09, 20 April 2013 (UTC)[reply]

I feel the same way. Ƿidsiþ 08:52, 20 April 2013 (UTC)[reply]

Why do you oppose it exactly? Not specifying the language leaves many problems: the link does not link to the correct section, the script template is not applied, and the word is marked in HTML as English (which creates usability problems). I wonder what justification there can be for ignoring those problems. —CodeCa t 12:33, 20 April 2013 (UTC)[reply]

This change breaks many, many, many discussion pages. -- Liliana • 12:36, 20 April 2013 (UTC)[reply]

I don't think displaying a small notification really breaks anything. It's just a friendly reminder that something is missing and needs to be corrected. I don't know how to make it less obvious without making it so unobvious that nobody sees it. —CodeCa t 12:39, 20 April 2013 (UTC)[reply]

Others' posts should never be edited, even in case of incorrect syntax and such. At best, this should be restricted to the main namespace. -- Liliana • 12:41, 20 April 2013 (UTC)[reply]

We've edited or broken people's posts in the past. Whenever a template is deleted, if that template is used in a past post, deleting it will break the page, but we do it anyway. In some cases we've replaced the template with an equivalent, but in other cases the pages remain broken. For example look at the transclusions of {{hr}}; some were replaced by "sh" but some still remain. Similar with {{zh}}. This isn't really any different. We can't always guarantee backwards compatibility, and indeed we shouldn't try to go too far out of our way for it. —CodeCa t 12:52, 20 April 2013 (UTC)[reply]

@CodeCat: Naturally, I am not opposing using "lang=" for non-English languages to add script, and whatnot. I am opposing making "lang=en" mandatory for English. What you wrote does not seem to apply to English terms without lang=: "the link does not link to the correct section, the script template is not applied, and the word is marked in HTML as English". What I am saying is, if there is no lang=, let "term" template assume the term is English, as it did before your edits. --Dan Polansky (talk) 13:09, 20 April 2013 (UTC)[reply]

I think you are a bit mistaken. It always has been mandatory, because specifying lang=en has never, in the history of the template, been equivalent to specifying no language. So it never assumed that the term is English, not before my edits and not after them. That is one of the biggest flaws in this template in particular, which others (which do default to English) never had because they were created properly from the beginning. The result is that we now have thousands of entries that use this template both for English and for many other languages, without specifying which. Simply changing the template so that English is the default is therefore not an option, because it would not be correct for the many thousands of non-English words that lack a language. The only option that I know of is to mark lack of a language as an error so that it be corrected. I am currently running a bot to correct some of the most obvious ones (uses where the {{term}} template is preceded by {{etyl}}, which allows the bot to figure out the correct language), but there are still many many more that need to be fixed. —CodeCa t 13:19, 20 April 2013 (UTC)[reply]

Re: "It always has been mandatory, ...": That seems incorrect. If lang= really were mandatory, the template would complain of a missing parameter. The parameter could only have been "mandatory" in a sense that I do not know. --Dan Polansky (talk) 13:24, 20 April 2013 (UTC)[reply]

What I meant is that the template doesn't do what it should do if the language is left out. The correct behaviour, when lang=en is given, is to use Latn as the script, "en" as the language, and link to the English section. But when no language is given, it uses None as the script, "" as the language, and links to no section. Therefore, to correctly link to English terms, the language is mandatory. —CodeCa t 13:29, 20 April 2013 (UTC)[reply]

All of that is irrelevant. This is one of our most heavily-used templates, especially by our less-template-sophisticated editors. Changes that significantly affect its behavior should be discussed thoroughly in an appropriate venue before being implemented. Most of the people who use it aren't going to have a clue what the ??? means, and a good many won't know where to go to find out. There should have been some steps taken to educate people before implementing it. Chuck Entz (talk) 16:24, 20 April 2013 (UTC)[reply]

There is a help message when you hover the cursor over it. That may not be entirely obvious, but actually writing the message out would look really bad and would have made even more people angry. The real "education" has been in {{term}}'s documentation, which I presume is the proper place to put it. —CodeCa t 17:03, 20 April 2013 (UTC)[reply]

Support the change.

No one has changed any discussion pages, but if you want your talk posts to continue looking the same, don’t leave live templates in them. Use subst. —Michael Z. 2013-04-20 23:24 z

Totally support making lang= obligatory, but we should wait until the bot run is over before displaying the ???s, and not display them at all outside the content namespaces. — Ungoliant ^(Falai) 23:53, 20 April 2013 (UTC)[reply]

The bot doesn't really have anything to do with the ??? either, the bot works from a category that can be added or removed independent of the question marks. But from the way the bot is running now, it's not really making a serious dent in the amount of pages. It is making the occasional change but it's skipping most of the pages in the list without doing anything (because it sees no change it can make). There were around 45 thousand pages in the list when it started, and I expect it won't be able to get rid of more than a few thousand of them currently; it's at 41 thousand now. —CodeCa t 00:14, 21 April 2013 (UTC)[reply]

But in this revision of warlock the term lie has ???s, and after the bot edit it doesn’t. — Ungoliant ^(Falai) 01:01, 21 April 2013 (UTC)[reply]

That's true, but that's only because the bot has changed something that happened to both remove the ??? and remove it from the category. What I am saying is, the bot works from the category, and the ??? doesn't influence that. If we removed the ??? the category would still be there, and we could also put in ??? and remove the category. —CodeCa t 01:06, 21 April 2013 (UTC)[reply]

But the bot does influence the ???s. What I was saying is that we should wait for the bot run to be over before displaying them, because there would be no benefit displaying something that makes our entries look bugged when it’s going to be automatically fixed soon enough. But I changed my mind, since the bot isn’t going to make a serious dent (unfortunately). — Ungoliant ^(Falai) 01:18, 21 April 2013 (UTC)[reply]

Support considering lang= obligatory (meaning only that it must be present: I think it's fine for it to be explicitly blank, but English should be lang=en), probably oppose whatever "bot run" Ungoliant and CodeCat are referring to (it doesn't seem like it was ever discussed or approved?), weakly support some sort of visual indication of missing lang= once that's rare (though I'd strongly support such a visual indication if it were visible only to admins and opters-in), and oppose distinguishing content namespaces from non-content namespaces in this respect, since that will just make it harder for editors to learn what they're supposed to be doing. —Ruakh_TALK 00:30, 21 April 2013 (UTC)[reply]

The bot run is adding lang= to uses of {{term}} in etymologies where it can use a preceding {{etyl}} template to determine the correct language. Basically, it's replacing {{etyl|xx|yy}} {{term|word}} with {{etyl|xx|yy}} {{term|word|lang=xx}}. It didn't seem like a very controversial change. —CodeCa t 00:35, 21 April 2013 (UTC)[reply]

Ah, O.K., that's fine, then. :-) (I mean, I still think it should have been proposed in the BP first. But I agree with you that it probably wouldn't be controversial.) —Ruakh_TALK 02:29, 21 April 2013 (UTC)[reply]

Perhaps we should have a class of error messages that are hidden from readers but displayed for all logged-in editors. —Michael Z. 2013-04-21 01:18 z

That might both be a good idea and a detrimental one. {{nl-noun}} shows "error" messages when some of its parameters are missing, and calls on the viewer to provide them. Since those messages were added to the template, I have seen quite a lot of editors - IPs, newly registered and experienced alike - take the messages to heart and provide the forms. We even have an editor, User:DrJos, who registered specifically to provide the forms and has now made it his life's work to fix them all. :) So I would say that's first-hand evidence that this kind of notice not only works, but it even gets IPs to lend a hand. So if we decide to hide these requests from IPs, we will be losing some of the editors who might help out. —CodeCa t 01:26, 21 April 2013 (UTC)[reply]

Don't forget that we also serve a lot of site visitors who don't edit and have no idea what "lang=" is. Why should some 10-year-old doing his or her homework have part of the content replaced by ??? so you can send a wake-up call to someone else? Are we the "dictionary that anyone can edit", or "the dictionary that everyone has to edit"? Chuck Entz (talk) 02:49, 21 April 2013 (UTC)[reply]

Re: " […] part of the content replaced by ??? […] ": That's a straw man, since the version with ??? still has all the same content. (The ??? appears before the term, not instead of the term.) Maybe you meant to say that the 10-year-old would think that the ??? had replaced actual content? —Ruakh_TALK 03:43, 21 April 2013 (UTC)[reply]

My mistake. I had already forgotten what the actual effect was, having only seen it on one page. Although I obviously overstated the effect, it still seems a bit much to clutter the main body of the text used by non-editors with stuff aimed strictly at editors. It might indeed cause concern among non-editors that something was broken that they didn't know how to fix.

I'm not opposing the eventual implementation of such a change, just the massive scale of the change combined with the lack of effort taken to get consensus and to get feedback about what effect it might have, let alone to prepare people for it. Something that noticeably changes the appearance of a significant percentage of our millions of entries should require more than a general mention of the principle behind it here and there, followed by a discussion on the template talk page that only a very few would even know about. Chuck Entz (talk) 04:15, 21 April 2013 (UTC)[reply]

I have rolled back CodeCat's edits to {{term}} because currently, it seems that only CodeCat and Michael support the ???s, whereas Dan, Widsith, Liliana, Chuck, and I oppose some aspect of CodeCat's edits altogether, and Ruakh and Ungoliant do not support putting in the ???s until non-lang-specified uses become much more rare. That's only 22% of editors in support so far. This is why we need to have BP discussions before making sweeping changes to the interface as readers view it. —Μετάknowledge^{discuss/deeds} 02:03, 21 April 2013 (UTC)[reply]

Not read the whole discussion, but do we need '???'? Is there any way of making these stick out less like a sore thumb, this is a dictionary after all, readers come here for lexical information, not to correct wiki syntax. PS there is a line in User:Mglovesfun/vector.js that converts {{term|foo}} into {{term|foo|lang=en}}. Mglovesfun (talk) 09:45, 21 April 2013 (UTC)[reply]

I think that so far, the majority of people in this discussion agree that it's a good idea to make sure {{term}} always has a language code. But that immediately brings up the question, how do we get there? Even if people want to add a language where it's missing, how can they do it? The reason why I added ??? was that it would make it obvious to editors that something needs fixing there. Making the problem visible and apparent is the first step towards fixing it, and that has been a real problem before. I also argued that showing a similar message on {{nl-noun}} has indeed helped to make the problem visible and therefore has led to more people fixing it. The bot I am running is helping, but it can only do so much; it has almost passed over all entries with a missing language but it has only managed to fix about 10% of the total (from 45 thousand to 40 thousand). A bot could never fix the majority of the entries that remain. So I suppose the real goal of this discussion is: if at least some of us agree that adding a language in all cases is a good thing, what can we do to make that happen and make it happen more quickly? If adding ??? to the entry is not the right way, then what is? —CodeCa t 12:06, 21 April 2013 (UTC)[reply]

If there isn't a bot solution for the remaining 90%, then I guess we'll just have to use MG's JS (or a modified form of it) on every page we're already editing. The reason why the ???s don't work is that instead of solving the problem, they create a new one. It looks messy and unprofessional, and users have to go for an unintuitive tooltip to find what's gone wrong. (Don't get me wrong, I love xkcd, but tooltips are not what people try first upon seeing a cryptic message.) This is not an acute crisis, so if a chronic solution is the best we have, so be it. —Μετάknowledge^{discuss/deeds} 14:39, 21 April 2013 (UTC)[reply]

What exactly does the script do? Blindly adding lang=en is not correct... if it were, we probably would have done that already. I think there is one approach that we could try in the long term. If we could weed out all the uses that are not English (which are presumably a minority) then it becomes more feasible to add lang=en to the remainder. Using Lua, we might be able to recognise some of the languages, and we can use other means as well. For example, anything with {{polytonic}} as the script is bound to be Ancient Greek (and that template even sets lang="grc" if nothing is provided), so adding lang=grc whenever sc=polytonic is present is safe. Adding lang=got where sc=Goth is also safe, and many other scripts are only used for one language so we can derive the language from the script. We can also look at the characters in the term being linked to. Templates can't recognise which characters a word consists of, but Lua can. So if a word contains, say, Hiragana or Cyrillic, we can be pretty certain it's not English. We could also separate out calls to {{term}} that use Latin characters that are not used in English, like å. Granted, none of those approaches is absolutely failsafe, but it would probably be right more than 99% of the time, and it would make it much easier to chip away gradually at the number until it becomes more manageable. And making a few mistakes (marking a link with the wrong language) is not serious, especially not considering that currently 40000 are marked with the wrong language (it can only get better!). —CodeCa t 14:55, 21 April 2013 (UTC)[reply]

I was imagining that most would be English, and then it would be easy for a human to scan it and fix the langcode if necessary. I don't know what percentage the script/character method can handle, but I'm sure it's noncontroversial for you to attempt it. —Μετάknowledge^{discuss/deeds} 14:59, 21 April 2013 (UTC)[reply]

I don't know how many there would be either, but I can add an invocation to a module (that needs to be made) which would add a category to the page when lang= is not present. That module can then decide to add the page to different categories depending on other factors like the script code or the characters in the word. The number of entries in each category would then be used to gauge what needs to be done. And even if one category contains only a few hundred entries, that's still a few hundred fixed and done. Every little bit helps, and we'll need to do this in little bits. :) —CodeCa t 15:05, 21 April 2013 (UTC)[reply]

How about an error message like the one after this term [! Editors: the preceding term template lacks a language code]. Visible to all, relatively unobtrusive, self-explanatory and ignorable. The copy could be made more accessible; it should convey that an improvement is needed but doesn’t affect the accuracy of the information. —Michael Z. 2013-04-21 15:57 z

How would that look if there is also a translation or a gloss? —CodeCa t 16:40, 21 April 2013 (UTC)[reply]

How about Greek όρος (óros, “term”) [! Editors: the preceding term template lacks a language code]. I think it belongs after the whole template, because it refers to that construction. If it were before the brackets, it would look more like it was referring to the term itself.

Uh-oh! That probably can’t work unless someone customizes or rewrites the javascript. Each collapsing element has to have a unique ID. In my browser, clicking any one of those examples expands them both. Even if we want that behaviour, duplicate IDs break the HTML. —Michael Z. 2013-04-21 23:04 z

If we wanted a bit more urgency and context, a background appearing on hover or expand could tie it all together like Greek όρος (óros, “term”) ⊕ Editors: this term template lacks a language code. —Michael Z. 2013-04-21 23:27 z

Lua script errors show a floating window when you click on them. Maybe you can have a look at how they work, and copy that? —CodeCa t 00:34, 22 April 2013 (UTC)[reply]

Can you point me to one, or tell me how to generate one? I remember something like that, but now when I try to save a module with a script error, I just see the big red box at the top of the page. —Michael Z. 2013-04-22 00:50 z

You can look in Category:Pages with script errors. —CodeCa t 01:03, 22 April 2013 (UTC)[reply]

Support making lang mandatory. It may also be possible to include automatic transliteration later. Perhaps rather than "???", it should say "which language???". --Anatoli ^{(обсудить}/^вклад) 23:52, 21 April 2013 (UTC)[reply]

I don't like calling it an error. For one thing, it's beside the point and adds extra verbiage, but mostly, it gives the impression that things are falling apart. I would suggest following the lead of some of our rf- templates: "This term template is lacking a language code. If you know it, please add it as a lang= parameter". Still verbose, but it would only show on hover. The symbol should be something small and innocuous, like the one Michael suggested above, or maybe a bullet (•). Even the question marks might not be so bad- as a trailing superscript. Or how about: όρος (óros, “term”)^[→?] (I'm sure there are attributes that would make it look more like a live control, but you get the idea). Chuck Entz (talk) 02:45, April 21, 2013 (UTC)

I have no strong opinion on the message and the format of the warning. Whatever the community decides but making the "lang" mandatory is important, otherwise, just use square brackets or something. I also think {{etyl}} should have the second parameter mandatory as well. Otherwise, people just add to English loanwords, even if they mean another language. --Anatoli ^{(обсудить}/^вклад) 02:56, 22 April 2013 (UTC)[reply]

There are a good number of uses of etyl with a null lang= parameter, as a way of standardizing the language name. I even used to do it myself, before I was aware of things like template overhead. I suppose they would be pretty easy to locate and subst out using a bot, though. Chuck Entz (talk) 03:30, 22 April 2013 (UTC)[reply]

I am aware of things like template overhead, and I still prefer to write {{etyl|foo|-}} when that is what I mean. I would oppose substing it away. —Ruakh_TALK 03:46, 22 April 2013 (UTC)[reply]

I support keeping lang mandatory (i.e., not defaulting to English) for now, fixing transclusions, and then defaulting to English. Or just keeping it mandatory. But I oppose any error message visible to not-logged-in users. This is a technical error, not a content error: it is a missing language parameter in the HTML, not a missing etymology or pronunciation. There's no need for visitors to see the error message.—msh210℠ (talk) 05:12, 22 April 2013 (UTC)[reply]

I oppose CodeCat's recent change to {{term}} for "keeping lang mandatory" (msh210) and for using ??? as "just a friendly reminder that something is missing" (CodeCat), as it is a little well- and a lot ill-done. What's that ill-done at all? First of all, void is "just a friendly reminder." Liliana wisely noted: "At best, this should be restricted to the main namespace." So did Chuck Entz.

CodeCat's change made all the past discussions look so ugly, freckled with so many ???, looking like subduing again the "global readership at CodeCat's disposal" (User:KYPark/mulberry, 16 April 2013), doing without due consensus again; again and again in my cases! ~~As I've discussed most for a year with {{term}} heavily used, mine must look ugliest so that CodeCat looks like aiming at me, megalomanically speaking.~~

Again, I'd attend to Liliana saying to CodeCat: "Others' posts should never be edited, even in case of incorrect syntax and such." This should be so because so vital and prior is the global readership of end users as end judges. Just unjust would be the interference or intervention of unequal, intermediary administrators with "others' posts" being arbitrarily edited. Delete could be the worst edit. My posts are supposed to suffer the worst blocking in effect. I wish CodeCat and others could learn a lesson from this happening.

--KYPark (talk) 12:46, 22 April 2013 (UTC)[reply]

I'm sorry if I can't take your arguments seriously if you're turning this into a personal vendetta against me. Please go and do something more useful. —CodeCa t 12:52, 22 April 2013 (UTC)[reply]

How dare you accuse CodeCat of “aiming at” you? Whether the changes to {{term}} were ultimately good or not, she is just trying to improve Wiktionary. The world doesn’t revolve around you. — Ungoliant ^(Falai) 14:28, 22 April 2013 (UTC)[reply]

If you're legitimately arguing against the change to the term template (belatedly, since it's already reverted), it would be best not to bring your own disputes with CodeCat into it, since I don't think anyone agrees with your assessment of them- some may disagree with her methods, but I don't know of anyone who doesn't sympathize with her reasons for doing what she's been doing. If you're trying to use this discussion as a forum for complaining about that matter, please don't. You'll just get people annoyed at you for cluttering an already-too-long discussion with unrelated issues. Chuck Entz (talk) 14:31, 22 April 2013 (UTC)[reply]

If you put a template in discussion, you are inviting editors to edit your text without even reading it. There’s no other sane way to look at it. —Michael Z. 2013-04-22 14:58 z

Editors are always wanted to do their best, say, even in using so many hard templates. Such is simply an ideal, esp. of wikis, more or less away from the reality in theory and practice. Rational choice theory is a mere theory heavily counter-balanced by bounded rationality.

The better editorship, the better readership. Both go together in concert. Much easier is to interfere with editorship than readership, ill or well. Liliana would advise CodeCat not to interfere (too much or trivially) with editorship in discussion, and I would with readership. It is regretable indeed if the past discussions remain freckled with so many a ???, only to be hardly corrected in response to the "friendly reminder".

If ''[[term]]'' is valid without adding #English, then {{term|term}} is valid as well without lang=en. For someone else to edit to add such additives, esp. in discussions, is overdone or ill-done, I fear, as far as I understand Liliana. Why? We can only talk more or less perfectly, hence either strength or weakness worth to be archived as given.

Anyway the technical phase of mess and fuss is over; the majority deny mandatory lang=. Yet, so ain't the moral phase behind that, to be taken seriously at least right here. I'd argue it is painfully arbitrary and immoral to ignore the priority global readership and do without the due community consensus, repeatedly. The validity of my argument should not be upset by such a "tail of speech" as taking my own examples, however double-barreled I may look.

--KYPark (talk) 05:04, 23 April 2013 (UTC)[reply]

Do you think you could sum that up in a simple sentence? —Michael Z. 2013-04-23 06:23 z

It depends on too many factors to sum up easily! I prefer to talk in detail, focuslessly, not always wisely, while Liliana may prefer the short cut. The shorter speech, the more penetrating, like the proverb. You could speak only of the first thing first. And we could wisely interpret what is implied below the tip of the iceberg. Anyway, Liliana made me a perfect, most impressive sense, I guess. Fair enough? --KYPark (talk) 06:42, 23 April 2013 (UTC)[reply]

I cam’t deny it. —Michael Z. 2013-04-23 14:01 z

I cam't define it. What is it precisely? --KYPark (talk) 04:01, 24 April 2013 (UTC)[reply]

What you wrote. I don’t understand it, so I can’t deny it. Cheers. —Michael Z. 2013-04-25 20:45 z

"Do you think you could sum [it] up in a simple sentence?" That is, for me to respond to three most unfriendly plus you unwittingly unleveling with them? Even omniscient and omnipotent God couldn't do so in human language of all imperfection, I guess. This ridiculous fuss was caused by making non-mandatory lang= mandatory, arbitrarily, as if the global readership and editorship should be at CodeCat's mercy! Originally, Z wished to do without that boring parameter, and suddenly CodeCat complicated it "horrible" (Z) for some reasons, said and unsaid. This is a genius for making one out of another at will. Such was the case with WT:Beer parlour/2013/March #Wiktionary:Etymology scriptorium/March 2013. Incidentally, the above three were most responsible for so doing. Assisted by Ungoliant, CodeCat did make a horrible ending out of Chuck Entz's unwitting beginning, remindful of the idiom make a mountain out of a molehill. This case implies too much for me to keep from saying much more, yet ...

--KYPark (talk) 07:38, 28 April 2013 (UTC)[reply]

Re: "Do you think you could sum that up in a simple sentence?". From all available evidence, no. Nothing will ever be said simply that can be bloated with tables, graphics, massive blocks of text taken from other pages, rambling discourses in poor English, etc. CodeCat started moving his more irrelevant topics from the Etymology Scriptorum to his user space, so now he's roaming through the discussion pages looking for any excuse to take potshots at her. Chuck Entz (talk) 15:43, 28 April 2013 (UTC)[reply]

The really sad thing is that I warned him some time ago against annoying everyone else so much that he'd lose sympathy, but it looks like he's doing just that. —CodeCa t 16:05, 28 April 2013 (UTC)[reply]

Support the change, but "???" looks horrible, I prefer "^{[language code?]}". --Z 12:07, 25 April 2013 (UTC)[reply]

??? has been removed a while ago. — Ungoliant ^(Falai) 14:13, 28 April 2013 (UTC)[reply]

Just a few lines above, you'd find a better or hotter place for you to answer nicely. I'm just afraid you miss that. --KYPark (talk) 14:39, 28 April 2013 (UTC)[reply]

lolwut? — Ungoliant ^(Falai) 14:54, 28 April 2013 (UTC)[reply]

Shall we speak English? --KYPark (talk) 14:59, 28 April 2013 (UTC)[reply]

Sure, just start saying things more relevant to the discussion instead of using every opportunity to vilify the contributors you don’t like. — Ungoliant ^(Falai) 15:06, 28 April 2013 (UTC)[reply]

This is not the right place for you to blame me, but perhaps over there. I am just staying here to inform you where you'd better respond and trivially to know from you what is "lolwut" at all in English. No reason to stay any more. --KYPark (talk) 15:18, 28 April 2013 (UTC)[reply]

lolwut is equally English as the next term, deal with it. Also, I support the change. User: PalkiaX50 ^{talk to meh} 15:30, 28 April 2013 (UTC)[reply]

Oh really? Ungoliant looks like a genius for pushing WTF, lolwut, etc., to me of en-2 so as to embarrass me and perhaps to delay. Anyway why do you jump in on his behalf? I don't really care you support the change but the arbitrary change at CodeCat's mercy, as if the world should revolve around CodeCat. Do you like that way? --KYPark (talk) 16:04, 28 April 2013 (UTC)[reply]

Oh please, I never said (nor was I implying) that Ungoliant is a genius. Secondly, I am not specifically highlighting to you that I support I just decided to say seeing as others have opposed and supported as well. User: PalkiaX50 ^{talk to meh} 16:22, 28 April 2013 (UTC)[reply]

I think I have a solution that would please everyone. With CSS, we can change the formatting of a word depending on whether it has a language or not. So, {{term}} could be changed so that it applies a CSS class to the text when no language has been specified. That way, everyone can decide individually how they want the "error" to appear to them, while the default would just appear as normal. So it would be opt-in and customisable for each user. Is that ok? —CodeCa t 14:19, 9 May 2013 (UTC)[reply]

Portuguese reflexive verbs

I have just added compadecer-se, but have no idea how to show its inflections. There is nothing in Wiktionary:About Portuguese and no obvious templates. The entry in Portuguese Wiktionary has no conjugation table. Any ideas? SemperBlotto (talk) 10:50, 21 April 2013 (UTC)[reply]

I don't know Portuguese, but in general, it is worth considering whether to direct the reader from compadecer-se to compadecer, along the likes of mračit se directing the reader to mračit. Nonetheless, as regards reflexive forms, different languages seem to use diffferent approaches. Portuguese entry dirigir-se directs the reader to dirigir for conjugation, as does encaminhar-se. --Dan Polansky (talk) 14:55, 21 April 2013 (UTC)[reply]

What do you do in cases where the non-reflexive verb doesn't exist? Then there is no entry to direct the reader to. On the other hand, let's imagine dirigir didn't exist and was not attestable, only dirigir-se. Then dirigir-se would have to have a conjugation table. But what should it contain? Suppose that it contains forms with the reflexive particle attached, so that it has te diriges. Then that would violate "all words in all languages" because diriges gets no entry, and that would confuse users who don't realise that "te diriges" is one term. Suppose on the other hand that the table instead displays te diriges, linked separately. Then we're faced with another dilemma: what would the entry diriges contain? It can't say "second person singular present of dirigir-se" because that's not correct, "te diriges" is the second person singular of dirigir-se, not "diriges". On the other hand, it can't be "second person singular present of dirigir" either, because dirigir doesn't exist. —CodeCa t 15:37, 21 April 2013 (UTC)[reply]

(after edit conflict) For Czech, I always create a non-reflexive entry even if all its uses are reflexive. Thus, for "mračit se", the definition is at mračit, where "se" is stated on the definition line; "mračit" is always used with "se". As for inflected forms, there would be e.g. mračila. Note that, in Czech, the reflexive particle se or whatever that is is separated from its verb, as in "pořád se na něho mračila", so I do not see it necessary to have mračila se as an inflected-form entry. --Dan Polansky (talk) 15:51, 21 April 2013 (UTC)[reply]

Yes, it's awkward, isn't it. In Italian, we hard code the pronoun in the inflection table (with no wikilink) and wikilink the inflected verb (even in the few cases in which the non-reflexive form doesn't exist (Hmm)). In French, we redirect the "pronoun + infinitive" to "infinitive".SemperBlotto (talk) 15:42, 21 April 2013 (UTC) (See lavarsi and se laver as typical of these)[reply]

In Dutch we don't have separate entries for reflexive verbs either. But that may not really be the best idea for all languages, because in some there is no space to separate the particle from the verb. Spanish and Portuguese are examples, but Catalan also has many pronouns that contract with the verb when next to a vowel (like in French). So Catalan might have adormir-se with the form m'adormo, and the imperative of acostumar-se is acostuma't. —CodeCa t 16:46, 21 April 2013 (UTC)[reply]

My practice has been using:

====Conjugation====
See {{l/pt|compadecer}}.

Listing each combination would be too messy. A verb form like compadeceria can give se compadeceria, compadeceria-se and compadecer-se-ia. — Ungoliant ^(Falai) 17:09, 21 April 2013 (UTC)[reply]

Hmm, Czech and Dutch verbs don't have entries for reflexive verbs but Polish and German do (not too many). Russian reflexive verbs are included because they are always spelled together and have variations in stress (на́чался or начался́) and the actual particle (-ся and -сь) can be different, -ся - after consonant, -сь - after vowel. I think it would be beneficial to have entries for reflexive verbs in Portuguese and other language, even if as a soft redirect. --Anatoli ^{(обсудить}/^вклад) 03:26, 24 April 2013 (UTC)[reply]

Should Wiktionary really include entries for characters?

Dictionaries are normally about words, and not the things that those words refer to. A definition on Wiktionary is therefore mainly concerned with giving enough information so that someone who is familiar with the referent knows that the word refers to it. So the goal of Wiktionary is not to describe in detail what the thing is that a word refers to. That's encyclopedic information, and belongs on Wikipedia. When you look at letters and other characters, it's really the same thing. When seen as a character in themselves, they are symbols and aren't really any different from, say, a triangle or a sine wave. They're concepts, not words. Presumably, Wiktionary has decided to include them because they form words, but I'm not sure if that is the best decision. It is definitely lexicographical to say C is pronounced /siː/ and has the plural Cs. But is it really lexicographical to say "C is the 3rd letter of the English alphabet"? I don't think it is, because that definition refers to the symbol C itself, not to its use as a lexical term. Definitions should say what something means, not what it is. C might indicate the third of a sequence, but that's what it means, not what the letter C actually is, so not quite the same. Similar for the etymology: describing where the shape of the letter C came from doesn't strike me as particularly dictionary-worthy. So I would like to ask whether this should be reconsidered? —CodeCa t 20:14, 23 April 2013 (UTC)[reply]

Amusingly, WT:CFI doesn't specifially include alphabetic characters. It mentions "characters used in ideographic or phonetic writing" but no mention of syllabic or alphabetic characters is ever made. Do what you want with this bit of trivia. -- Liliana • 20:18, 23 April 2013 (UTC)[reply]

I agree. In addition to not being lexicographic information, I see the following issues:

Because scripts like Latin and Cyrillic are used in many languages, entries for characters in those scripts end up being excessively large.
Who is the target audience of character entries? My best guess is people who are starting to learn a language. In that case, it is much better to have per-language appendices containing “entries” for every character of a language.

— Ungoliant ^(Falai) 21:09, 23 April 2013 (UTC)[reply]

I'm not sure if my suggestion would really make the pages a lot smaller. In every language, letters still have a pronunciation, which is definitely material for a dictionary. —CodeCa t 21:15, 23 April 2013 (UTC)[reply]

I’d move the pronunciations to the appendix pages I suggested as well. — Ungoliant ^(Falai) 21:19, 23 April 2013 (UTC)[reply]

I don't mind including characters like letters and punctuation that are used linguistically. But I really don't see any lexicographic value in having entries for things like →, ∟, ┌, ▒, and ☺. I once removed ⍾, ⎙, and ⎆ from WT:Wanted entries because they aren't words, but I got reverted. I still don't think they're dictionary-worthy, though. —An gr 22:11, 23 April 2013 (UTC)[reply]

There is a relatively small universe of such terms. With respect to those that are actually used as components of words, that seems to me to be lexical significance enough to keep them - even if most did not have additional meanings capable of being reported in a dictionary. bd2412 T 01:45, 24 April 2013 (UTC)[reply]

I don't see why we shouldn't. They are generally included in single language dictionaries, and they are lexical information, not encyclopedic. Even the non-letter characters I find useful, convenient to be able to look up like words.--Prosfilaes (talk) 08:48, 24 April 2013 (UTC)[reply]

I would rather keep entries for letters of Latin alphabet. Even for →, it is kind of nice to find the Unicode code point for the symbol in Wiktionary. So I would rather keep all Unicode codepoints. --Dan Polansky (talk) 20:00, 24 April 2013 (UTC)[reply]

S Yes, dictionary entries are for terms, including names, but not for things. Most of the letter entries would remain, because in English, at least, A is the name of the letter A, among a few other senses or subsenses.

But punctuation marks and diacritics certainly aren’t words, nor are mathematical and logical symbols. Just look at any professional dictionary, and see what is included as entries, and what appears in tables and appendices.

A “Unicode code point” isn’t a even a character, it is an encoded representation of a character. We don’t have an entry for the code point U+0041, any more than we should have one for Morse code “dot-dash” or for the signal flag , or the CDC 1604 key-punch card code 31 – these are all ways to encode the letter A, and not lexical entities in themselves. —Michael Z. 2013-04-25 20:39 z

We aren't a printed dictionary; we're a computerized dictionary. There's no concern about how people are supposed to look up →, whether it should go before A or after Z or under arrow, since we don't have to worry about order. We also don't have to worry about space or many other things; we can worry about what people want to look up.

I don't get your point about Unicode code points; I don't think Dan Polansky wants us to add U+0041, but for A and → and 倀 and ─ and the rest. "A" may be a string of bits referring to Unicode, etc., but for our purposes we can just call it a character or word.--Prosfilaes (talk) 11:03, 26 April 2013 (UTC)[reply]

I don’t get your point about dictionaries. Professional dictionaries don’t refrain from “defining” symbols like arrows (→) because they can’t be printed or sorted – they certainly can – they omit them because they are not words.

This is exactly my point about Unicode. We include words (technically, “terms” or “lexical items”). Some editors think that having a code point in the ultimate encoding scheme makes a thing a lexical item, but it does not. Or that it proves that these characters are significant lexical entities, because each has a code point: { → ⇾ ➙ ➔ ➛ ➝ ➞ ➡ ➤ ➧ ➨ ➫ ➯ ➱ ➺ ➻ ➼ ➽ ⟶ ￫ }. (They are arguably not even characters in the sense of writing. I can “encode” another three dozen such “characters” with a pen on paper, but that doesn’t make them dictionary items any more than having a Unicode code point does.) No matter how great it is, Unicode is merely one way of representing text, and does not define language. —Michael Z. 2013-04-26 22:30 z

I think anything used to convey meaning in human language is good. Don't ask me to give a robust definition of that because I can't. Mglovesfun (talk) 22:50, 26 April 2013 (UTC)[reply]

Didn't we have this discussion before, about encoding things like Mʳ that we could actually find encoded that way?

We don't include words; we include strings of letters or code points. color and colour are different pages, for example, and yet we combine /kɑt/ (caught) and /kɔt/ (caught) and separate /kɑt/ (cot). I don't see any reason to get overpure here; Unicode is the substrate for our system and we should rely on that and use it.

From another direction, we are a part of Wikimedia; Unicode code points is not something that any other project covers in depth, and thus we should stretch our ambit so that Wikimedia covers everything.--Prosfilaes (talk) 23:40, 26 April 2013 (UTC)[reply]

Michael Z's argument makes a bit more sense to me than Prosfilaes's. While Wiktionary is encoded in Unicode, it's not tied to Unicode; we shouldn't be making editorial decisions based on the encoding we use, we should be independent of it. From a lexical/typographical point of view, "Mʳ" is a capital M with a superscript small R, and that's the way that Wiktionary should treat it as well. I think Michael Z's point about encoding your own codepoints by drawing little doodles on a paper is very interesting, because it makes it clear how detached Unicode can be from the written reality that we are actually trying to document. In our modern society, we've become almost enslaved to our computer's capabilities and what we write is determined in many ways by what a computer is capable of producing. But just 50 years ago, that wasn't the case, and people happily made up new characters and used them in their works. Esperanto (late 19th century) introduced a whole new set of letters with diacritics, and APL (a programming language of all things!) made up a whole set of characters that nobody else used. Going back even further, you see that people made letter types so that they could print what they wrote, even going so far as to make up ligatures that mimicked handwriting. In medieval times the situation was still further removed from Unicode's "reality", where people would happily stack characters on top of each other, write little lines and squiggles all over the text, and made up all kinds of abbreviations which would use whatever formatting they found useful. So if Wiktionary's task is to document usage, then we can't let Unicode decide what to document because it's clear that Unicode is quite far from an accurate representation of the usage our CFI wants us to record and cite. If Unicode and its characters are the lexical reality, then I guess the sky must be made out of 5 megapixels. :) —CodeCa t 23:53, 26 April 2013 (UTC)[reply]

From a modern typographical point of view, "Mʳ" is U+004D U+02B3. Both in the computer, and in the way that it's written, it's not a superscript small R, and the font maker will have to deal with that.

(Esperanto wasn't new letters, then or now; both the typography of the time and Unicode have no problem with arbitrary combinations of accents on existing characters.)

As I mentioned in my post, we don't handle the language that a computer is capable of recording and playing back; neither /kɔt/ nor more accurately

Audio (US):

(file)

are handled by Wiktionary. As a practical thing, being say or play a word into your phone and have it come up with a definition and spelling would be worlds more useful and used then anything based on medieval handwriting.

In any case, whether or not we should try and handle all the non-Unicode stuff strikes me as irrelevant to the question of whether we should handle the Unicode stuff. Whatever they did in the past is irrelevant to the fact that Unicode is the dominant system today, and no matter what you can imagine creating, people are likely to select Unicode characters and enter them into Wiktionary and not random doodles.--Prosfilaes (talk) 10:51, 27 April 2013 (UTC)[reply]

Unicode’s development follows language, imperfectly, and not the other way around. Unicode is also designed to represent non-linguistic writing, like typographical ornaments, mathematical equations, computer code, and UI elements. We are limited by Unicode in how we can represent the language. But we are a dictionary, not a code book. Our subject is written language. —Michael Z. 2013-04-27 14:45 z

☞ Re: Mʳ: this is an encoding error. It contravenes the Unicode standard: The fact that the latter two letters contain the word “superscript” in their names instead of “modifier letter” is an historical artifact of original sources for the characters, and is not intended to convey a functional distinction in the use of these characters in the Unicode Standard. ¶ Superscript modifier letters are intended for cases where the letters carry a specific meaning, as in phonetic transcription systems, and are not a substitute for generic styling mechanisms for superscripting of text, as for footnotes, mathematical and chemical expressions, and the like.[15]

We attest spelling and usage, not technical errors. A purposeful misspelling like pr0n is not the same thing as a misapplication of HTML and Unicode in encoding M^r as Mʳ. We may as well start considering OCR errors as attestations. —Michael Z. 2013-04-27 18:57 z

☞ Re: We don't include words; we include strings of letters or code points. color and colour are different pages. Yes, and labors is a form of labor, but labour and Labour are independent terms. No matter which way you look at it, our organization is inconsistent and needs improvement.

Prosfilaes, this is still a dictionary, that aims to define all words in all languages. Sure, we use strings of text to represent words and web pages to organize them as well as we can, but don’t extrapolate and abstract that into something other than a dictionary. —Michael Z. 2013-04-27 19:23 z

I don't see why you see a difference between someone willfully using a number for a letter or a phonetic letter for another letter; "pr0n" is as much an encoding error as "Mʳ". In any case, the point was not to restart the argument, just to remind you that we'd had that discussion before.

Our organization is not inconsistent; each page is denoted by a string of characters. The occasional redirect is the only break from that.

Don't tell me not to do something; tell me why I shouldn't do it. Dictionaries frequently include a lot of stuff that's not just words; many dictionaries come with biographical, geographical, and scientific data. Storing Unicode codepoints is something that would have unique value for us and no more not make us a dictionary then including entries on "George Washington" turned other dictionaries into not dictionaries.--Prosfilaes (talk) 22:46, 27 April 2013 (UTC)[reply]

If WT after all is to help readers resolve the semantic ambiguity anyway involved in speech and writing, then the punctuation marks (PM's) should necessarily come in as semantic functors. However, the trouble is that the main pages are designed or structured around words rather than PM's. Then we'd have a few options, say, as follows:

A main page for each PM in spite of inadequate design.
A main page for all PM's. In this and next cases, REDIRECT's may be well used.
WT:Puncutation marks, probably subpaged for each PM.

In addition to helping readers resolve ambiguity, WT should guide them to its own design and functions, including Unicode as an integral and vital element, to which we'd also apply the above options.

--KYPark (talk) 03:31, 27 April 2013 (UTC)[reply]

Why not? It should!

Why not? It should! Traditionally, lexicographically, and perhaps lexicologically.

Generally, it is quite desirable to review anything, esp. from the bottom up, as low as possible. Biased, however, you'd fall into the pitfall or vicious cycle of circular reasoning, as usual.

Say, you presuppose, to begin with: "Dictionaries are normally about words, and not the things that those words refer to." This may be enough for you to begin too wrong!

See first:

w: The Message in the Bottle #"The Delta Factor" (1975)

A most commonsensical fatal fallacy is such that the word in and of itself does refer or relate to the thing or referent, likely magically, remindfully of "word magic".

Unconvinced by my words, be convinced by:

q: Aldous Huxley #Words and Their Meanings (1940).

This includes the opening quotation of:

"BOOK ONE The Functions of Language"

S. I. Hayakawa (1949) Language in Thought and Action.

In a nutshell:

The old idea that words possess magical powers is false; but its falsity is the distortion of a very important truth. [...] Words are magical in the way they affect the minds of those who use them.

That is, "Words are magical" only to "the minds of those who use them." Put more precisely, cognitive minds are magical, rather than words, hence cognitive sciences since the late 70s! Recall "The Delta Factor" (1975).

This recognition was quite a queer revolution in sheer silence or sheer mystery.

All life comes back to the question of our ideas -- the medium through which we relate words to things, ill or well.

This is my parody of:

All life comes back to the question of our speech -- the medium through which we communicate. -- Henry James.

This is the first of the ten quotations that open:

C. K. Ogden & I. A. Richards (1923) The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism

As far as my knowledge goes, you'd better believe me, this is the origin or center of the cognitive earthquake or revolution, vividly evolved since the late 70s but in sheer mystery!

All I'm saying is perhaps U're doing too wrong more often than not!

--KYPark (talk) 03:39, 24 April 2013 (UTC)[reply]

(This'd best be where CodeCat, etc., would respond to "Why not? It should!" above. Thanks. --KYPark (talk) 08:18, 26 April 2013 (UTC))[reply]

Request for comment on inactive administrators

(Please consider translating this message for the benefit of your fellow Wikimedians. Please also consider translating the proposal.)

Read this message in English / Lleer esti mensaxe n'asturianu / বাংলায় এই বার্তাটি পড়ুন / Llegiu aquest missatge en català / Læs denne besked på dansk / Lies diese Nachricht auf Deutsch / Leś cal mesag' chè in Emiliàn / Leer este mensaje en español / Lue tämä viesti suomeksi / Lire ce message en français / Ler esta mensaxe en galego / हिन्दी / Pročitajte ovu poruku na hrvatskom / Baca pesan ini dalam Bahasa Indonesia / Leggi questo messaggio in italiano / ಈ ಸಂದೇಶವನ್ನು ಕನ್ನಡದಲ್ಲಿ ಓದಿ / Aqra dan il-messaġġ bil-Malti / norsk (bokmål) / Lees dit bericht in het Nederlands / Przeczytaj tę wiadomość po polsku / Citiți acest mesaj în română / Прочитать это сообщение на русском / Farriintaan ku aqri Af-Soomaali / Pročitaj ovu poruku na srpskom (Прочитај ову поруку на српском) / อ่านข้อความนี้ในภาษาไทย / Прочитати це повідомлення українською мовою / Đọc thông báo bằng tiếng Việt / 使用中文阅读本信息。

Hello!

There is a new request for comment on Meta-Wiki concerning the removal of administrative rights from long-term inactive Wikimedians. Generally, this proposal from stewards would apply to wikis without an administrators' review process.

We are also compiling a list of projects with procedures for removing inactive administrators on the talk page of the request for comment. Feel free to add your project(s) to the list if you have a policy on administrator inactivity.

All input is appreciated. The discussion may close as soon as 21 May 2013 (2013-05-21), but this will be extended if needed.

Thanks, Billinghurst (thanks to all the translators!) 04:34, 24 April 2013 (UTC)[reply]

Distributed via Global message delivery (Wrong page? You can fix it.)

Looking at the other projects' policies, and at our own inactive admins, I'd like to propose that we have a policy vote about this. What do you think about removal of adminship from admins who make less than 10 mainspace edits in a year? That sounds reasonable (compare with our voting requirements, for example). —Μετάknowledge^{discuss/deeds} 05:40, 24 April 2013 (UTC)[reply]

Defectives do far more harm than inactives. --KYPark (talk) 06:40, 24 April 2013 (UTC)[reply]

As that proposal will override local consensus if it passes, I invite all Wiktionarians to oppose the proposal so we can stay independent and govern ourselves. -- Liliana • 08:04, 24 April 2013 (UTC)[reply]

The proposal will apply only to projects without an admin review process. Since we have one, it won't apply to us. —An gr 08:24, 24 April 2013 (UTC)[reply]

On the talk page, it says it applies to us too. -- Liliana • 08:26, 24 April 2013 (UTC)[reply]

You're right. I was thinking of Wikisource. —An gr 08:31, 24 April 2013 (UTC)[reply]

I've fixed that.—msh210℠ (talk) 07:20, 25 April 2013 (UTC)[reply]

The question would be if we can govern ourselves free from or of defectives and disruptives in disguise. --KYPark (talk) 13:56, 24 April 2013 (UTC)[reply]

Category:Hungarian nouns suffixed with -acs, et al

Why are these suffix categories sorted by PoS? It's especially confusing that Hungarian prefixes aren't. Would anyone object if I changed them to the same format as most (if not all) other languages? Ultimateria (talk) 14:46, 24 April 2013 (UTC)[reply]

I think we should wait, {{suffix}} allows for this sort of thing (not only {{hu-suffix}}) and last time I talked to User:Panda10 about it, he opposed deleting {{hu-suffix}}, {{hu-prefix}} and {{hu-affix}}. We should at least let some of our Hungarian editors comment; a couple of days is nothing. Mglovesfun (talk) 14:53, 24 April 2013 (UTC)[reply]

In many languages, the suffix is what determines the part of speech, so this isn't really all that strange. I oppose changing it, and support applying this to more languages. —CodeCa t 14:59, 24 April 2013 (UTC)[reply]

Please do not change it. In several cases, the same suffix will create a different PoS and it is best to keep these categories of words separately. --Panda10 (talk) 13:58, 1 May 2013 (UTC)[reply]

Multiple user pages using "/"

Greetings. I've noticed that some users have created multiple pages for their user by creating pages with a backslash after their user name. (E.g. User:[username]/1000EnglishEntries.) Is this normally accepted, and is there a limit as to how many pages you can have? Thanks. TeragR (talk) 17:16, 25 April 2013 (UTC)[reply]

If the content supports the work of Wiktionary, there is no limit that I am aware of. Any significant volume of content not related to the work of Wiktionary (including maintaining friendly relations useful for that work), whether or not on a subpage, is not permitted. DCDuring TALK 21:09, 25 April 2013 (UTC)[reply]

Normally accepted and no limit. WT:USERPAGE should cover this. Mglovesfun (talk) 21:46, 25 April 2013 (UTC)[reply]

It just says the same rule apply to subpages as to the main user page, that's enough, right? Mglovesfun (talk) 11:32, 27 April 2013 (UTC)[reply]

Yes, that is enough. Understood. Thanks! TeragR (talk) 03:35, 15 May 2013 (UTC)[reply]

Administrator communication

I'm having trouble with an administrator that often reverts legitimate revisions en masse and deletes entries without bothering to message the user or start a talk page on the matter. I wouldn't be so troubled about it if they was simply an editor, but as an administrator, I expect more from them. I've contacted them, but they refute any culpability. Is there anyone that can intervene and ask them to better communicate with others, in both initiating contact and conducting themselves is a cooperative manner? Thanks. --Victar (talk) 20:37, 25 April 2013 (UTC)[reply]

We do have the problem of having a very high ratio of pages to patrollers. This leads to curt interaction. You could try posting to the entry talk pages or visiting one of the pages like Wiktionary:About Frankish or Wiktionary:About Proto-Indo-European and leaving a message on a talk page there to determine what problem there may have been with your contribution. DCDuring TALK 21:17, 25 April 2013 (UTC)[reply]

Regardless, I think there is some standard to be maintained as an administrator. The changes themselves are not what are in question, but rather the manner in which the admin communicates, or the lack thereof. --Victar (talk) 23:33, 25 April 2013 (UTC)[reply]

Another thing you could try is recognizing the fact that CodeCat is a knowledgeable, experienced, and respected editor on this project, and you are still relatively new. Showing some politeness, respect, and dare I say even a bit of deference would get you a long way. Simply put, the administrators on this project have all had to put up with new editors who think they know better, which is a tiresome process, and one which is somewhat jading. As far as I can tell, CodeCat has good reasons for their reversions, has been reasonably professional in their conversation with you, and generally met English Wiktionary admin standards that I am comfortable with. -Atelaes λάλει ἐμοί 00:12, 26 April 2013 (UTC)[reply]

Again though, it isn't a matter of the quality of their revisions or even the way in which they communicate; it's the lack in communication that I find troublesome. If an admin is going to delete your work en masse without even discussing it with you, why would anyone want contribute? If you need more people, this is not the way to attract them. --Victar (talk) 00:33, 26 April 2013 (UTC)[reply]

You don't realise the vast quantity of vandalism, idiocy, ill-informed edits, and good faith errors we have to clean up after and weed through. If we left a personalised message for everyone who made a mistake, it would take far too much time and (wo)manpower. We were all newbies once too, and we learned the inane template system and confusing structure as well. As long as you don't make it into a conflict, it almost certainly won't become one, and you can learn and move on. —Μετάknowledge^{discuss/deeds} 00:41, 26 April 2013 (UTC)[reply]

I understand and can appreciate the both the quantity and quality of work they do, I really do. I just expected more collegiality from them, especially for someone they had already built a report with. I'm disappointed. --Victar (talk) 01:02, 26 April 2013 (UTC)[reply]

Though you yell heeelp here in despair, few'd help you out. Maybe no moderation but arbitration. Individual editors are as powerless as slaves against the powerful united state of arbitrators, I fear. --KYPark (talk) 07:13, 26 April 2013 (UTC)[reply]

Victar is supposed to be concerning or complaining, as I used to, about both the administrator's moral(ity) and the editor's morale, mismatched, rather than the technicality of administration. No doubt, it is fatally self-defeating and immoral within the participatory wikis to discuourage editors as if vandals anyway. --KYPark (talk) 03:43, 26 April 2013 (UTC)[reply]

Number/Numeral categories what's the story?

I'm just wondering since I remember discussion about the categories for numbers and/or numerals a while back. Did any decisions or anything of the like come out of it? I mean, for a given language, what number related categories should we have, and what shouldn't we have? Ever since I noticed the controversy or uncertainty about this issue months and months ago I made sure to ignore any I saw on WantedCats. But I'm just curious atm and probably won't be staying on wikt much longer for the moment today so I was wondering if someone could easily tell me what happened and perhaps even direct me to the relevant discussion if they feel the need. User: PalkiaX50 ^{talk to meh} 13:24, 27 April 2013 (UTC)[reply]

As far as I know, the decision was to use numeral when it represented a distinct part of speech, and to use some other part of speech when more appropriate. In particular, ordinal number words are almost never a "numeral" part of speech, they are usually adjectives. The cardinal and ordinal numbers are categorised in their own topical categories, which are based on meaning rather than on part of speech. —CodeCa t 13:35, 27 April 2013 (UTC)[reply]

Is German infinitive ending -en considered a suffix?

In German, every verb in its infinitive form ends in the morpheme -en (or in its much rarer variants -ern or -eln). Should it still be considered a suffix? Some users seem to think so. For example, the etymology of the verb vernetzen was recently changed from {{prefix|ver|Netz|lang=de}} to {{confix|ver|Netz|en|lang=de}}. I don't think that makes sense, since -en is just a grammatical morpheme that marks the infinitive rather than a lexical morpheme of word formation. Longtrend (talk) 11:35, 29 April 2013 (UTC)[reply]

I’d say it is. Inflectional suffixes are also suffixes. — Ungoliant ^(Falai) 11:47, 29 April 2013 (UTC)[reply]

I'd say it isn't. Our suffix categories are usually broken up into several subcategories based on usage, so there's Category:German noun-forming suffixes, Category:German verb-forming suffixes and so on. There's also Category:German inflectional suffixes, which would contain things like -em, -en, -te. -en can be used to form new verbs, something like Category:German verbs suffixed with -en is horribly misleading. -en has to be present in a verb, so it's not true suffixation but really more like adapting the word morphologically into a verb. That's quite different. Latin -us is the same; it's not used as a way to form nouns, but rather as a way to make morphologically non-conforming nouns conform to the grammar of the language. —CodeCa t 12:54, 29 April 2013 (UTC)[reply]

But that doesn't belong in the etymology. That would mean using {{suffix}} or {{prefix}} for almost every entry in almost every inflected language. Does falar really have a different etymology from falo? One that's worth distinguishing in the etymology section? Chuck Entz (talk) 12:58, 29 April 2013 (UTC)[reply]

I understand your concerns, but when you say "adapting the word morphologically into a verb"....well that's done by means of a suffix, isn't it? When German invents a new verb based on a noun, it sticks the suffix -en on the end of it. So in my opinion, it is a suffix. (In my opinion also, all of these categories are a complete waste of time and energy, but that's a separate issue...) Ƿidsiþ 13:17, 29 April 2013 (UTC)[reply]

And vernetzt is ver + netz +t and vernetzte is ver + netz +te. The inflectional morphology is the result of the conversion to a verb, not the cause. In an analogous English case, would you describe a verb derived from a noun as noun + null ending, since our lemma is the unmarked form? Chuck Entz (talk) 13:55, 29 April 2013 (UTC)[reply]

No, and that is a silly analogy. A better one would be to consider if regular English past-tense forms are formed by adding a suffix -ed. In my opinion (and that of the OED), this is the case (although I see no value in putting them all in a category). Ƿidsiþ 14:40, 29 April 2013 (UTC)[reply]

No it wouldn’t. Falar is not derived from a noun + an inflectional suffix, vernetzen is.

Maybe it’s better to think of inflectional suffixes as sets, instead of single suffixes. For example, the Portuguese 1st conjugation has {-ar, -ando, -ado, -o, -as, -a, -amos, -ais, -am, etc.}, the 2nd has {-er, -endo, -ido, -o, -es, -e, -emos, -eis, -em, etc.} and the 3rd has {-ir, -indo, -ido, -o, -es, -e, -imos, -is, -em, etc.}. Consider the word monitorar, it is the noun monitor + the 1st conjugation paradigm; since the lemma of Portuguese verbs is the impersonal infinitive, the etymology should, in my opinion, display monitor + the 1st conjugation paradigm’s impersonal infinitive suffix (-ar). — Ungoliant ^(Falai) 13:27, 29 April 2013 (UTC)[reply]

Not in Portuguese, but it ultimately comes from fabula. My point is that it's not the sticking of the inflectional ending on it that made it a verb. Because it became a verb, the inflectional ending was added. Chuck Entz (talk) 13:55, 29 April 2013 (UTC)[reply]

That "inflectional ending" can be considered a suffix though. Ƿidsiþ 14:40, 29 April 2013 (UTC)[reply]

Not really, they are separate things from a formal point of view. In Indo-European at least, much derivation involves adding a suffix to extend the basic stem, which is distinct from the inflectional ending that comes after it. Indo-European words are formed as root + one or more suffixes + inflectional ending. This has become somewhat more muddled in later languages, because many languages have "zero endings" which are inflectional endings that are empty, and also because the distinction between root and suffix is no longer as apparent. So for modern IE languages, it's easier to just consider stem + ending, and treat the stem as the more "invariable" part. In this view, -en is definitely not part of the stem in German, and neither is -ar in Portuguese (although the -a- on its own might be). In German in particular, creating a verb from a noun is often a so-called "zero derivation" where both stems are the same. So there is really no suffixation involved, just a change of endings from the noun set (genitive -s, plural -e(n), dative plural -en) to the verb set (infinitive -en, 3sg present -t and so on). The reason that it appears as suffixation is because the lemma form (nominative singular) of a noun stem generally has a zero ending, whereas the lemma form of a verb stem has an overt ending. But if it had been the other way around (say, nouns had -en in their nominative singular, and the infinitive had a zero ending) then would it still appear as suffixation? I don't think it would. And if you look at Latin or (to a lesser extent) Portuguese, there are few forms that have no ending, so something like "replace -us (la) / -o (pt) with -ar(e)" can hardly be called suffixation by itself. Rather, the real suffixation is adding the verbal derivation morpheme -a- (first declension) to the noun stem, which then creates a verbal stem that requires -r(e) as the infinitive ending. —CodeCa t 14:52, 29 April 2013 (UTC)[reply]

Re: “although the -a- on its own might be”: -a- is the suffix -ar’s thematic vowel.

Re: “"replace -us (la) / -o (pt) with -ar(e)" can hardly be called suffixation by itself”: it can, because suffixes are usually added to the stem, not the whole word. Cabeludo is cabel- (stem of cabelo) + -udo; here the suffix is added to the stem and, similarly, verbs formed from nouns have the conjugation suffixes added to the noun’s stem.

Even if “suffixation” and “suffix” aren’t the correct terms used by linguists, our etymology sections don’t use those terms. — Ungoliant ^(Falai) 15:16, 29 April 2013 (UTC)[reply]

To me, cabeludo is formed by adding -udo to the stem cabel-, but -udo is itself -ud- (suffix) + -o (ending). So the stem of the new word is cabelud-, with the added note that it follows the -o/-os paradigm (see below). —CodeCa t 15:30, 29 April 2013 (UTC)[reply]

Irrespective of whether the suffix was added because it became a verb, or whether it became a verb because the suffix was added (and how can you tell?), a new word appeared and this new word is a previous word + a set of suffixes. You could just say that monitorar came from monitor, but then why isn’t it *monitorer or *monitorir? That’s because it’s monitor + {-ar, -ando, -ado, etc.}, not monitor + {-er, -endo, -ido, etc.} nor monitor + {-ir, -indo, -ido, etc.}. The word was derived with a specific set of suffixes, and this set’s lemma suffix should be added to the etymology. — Ungoliant ^(Falai) 15:16, 29 April 2013 (UTC)[reply]

That's true, but that works only if the lemma form actually shows this distinction in paradigms. In the case of German, the infinitive suffix doesn't show the verb paradigm. Of course, the paradigm is included in the derivation process, but you don't see it. —CodeCa t 15:30, 29 April 2013 (UTC)[reply]

How about verbs that are formed from a noun? As in Dutch schaats ( a skate) ==> schaatsenrijden ==> schaatsen (verb). Couldn't you say in that case that affixing -en is a (productive) way of generating new verbs? A productive suffix? Think of faxen or sms'en Jcwf (talk) 15:40, 29 April 2013 (UTC)[reply]

I'm not saying it's not productive, I'm saying that -en isn't the suffix used to perform the derivation. The verb is more than just its infinitive... alongside schaatsen there is also schaats, schaatste, and so on. How would you say schaats (the verb form) is derived from schaats (the noun)? To say that people first create the infinitive and then replace the infinitive ending with a zero ending would be silly. Treating the infinitive as the lemma form is only a lexicographical convenience but not the reality; verbs can exist without their infinitives, and choosing one of the forms as the lemma is arbitrary. What if Dutch verbs were lemmatised as the 1st person singular? How would we denote the etymology then? When schaatsen is derived from schaats, there is no suffixation involved (or rather zero-suffixation). Rather, we just change the lemma form of one part of speech to the lemma form of another, but the actual derivational process is completely independent of which lemma form you choose, so deriving schaats from schaats is not only a valid alternative, it's the exact same thing. —CodeCa t 15:52, 29 April 2013 (UTC)[reply]

I think "schaats" (Dutch noun stem) and "schaats" (Dutch verb stem) aren't the same thing and that we can still see the etymology because "schaats" (noun: runner or blade) still rules out "schaats" (noun: the act/result of skating), while we do have the noun "loop" (a competition in "lopen"; a track to "lopen"). --80.114.178.7 00:18, 4 May 2013 (UTC)[reply]

That said, if "-en" is a suffix, it is a suffix to the verb "schaats" to make the infinitive (and 1st person plural of the present tense &c.), not a suffix to transform a noun into a verb. Perhaps it would be good to have a header "Stem" (like we have "Noun" and "Verb"), if only to have terminology and/or a place to link to (do we need "Noun stem"/"Verb Stem"?). --80.114.178.7 00:18, 4 May 2013 (UTC)[reply]

I agree with Widsith and Ungoliant, it's a suffix. I'm surprised that there's debate on this point. I find it no more or less useful to categorise all German verbs suffixed with -en together than to categorise all English past tense forms suffixed with -ed together, but -en and -ed remain suffixes. (And I can conceive of such categorisation being at least slightly useful, in that there are other suffixes—wandeln in German, dreamt in English—and someone might want to find only -en / -ed words.) - -sche (discuss) 20:21, 29 April 2013 (UTC)[reply]

I definitely don't agree with making categories based purely on allophonic grounds. The ending of wandeln isn't somehow a different one from the -en that most other verbs end in. It's the same thing, just with a different shape depending on the stem. And I am surprised that you think there is no debate. What arguments do you have against the ones I've raised? How does one get schaats (“I skate”) from schaats (“a skate”) by suffixing -en? That doesn't make any sense to me at all. —CodeCa t 20:48, 29 April 2013 (UTC)[reply]

If one forms words in Dutch the way one forms them in German, then it seems one takes schaats (“a skate”, noun), suffixes -en to get a verb, and conjugates that verb like other verbs that end in the suffix -en, resulting in forms like schaats (“I skate”). That seems as obvious to me as your analysis seems to you, so I don't know if we'll be able to do anything but agree to disagree... - -sche (discuss) 23:24, 29 April 2013 (UTC)[reply]

Your analysis depends on treating the infinitive form as the basis from which other forms of the verb are derived. But it doesn't work like that in reality. The fact that we choose different lemmas in different languages is a reflection of that. For example, in Latin any of the Balkan languages, would you suggest that people first add a suffix to create the first person singular, and then conjugate that? I'd say that it's more realistic to say that when deriving a new verb, people create the whole verb and the complete set of its forms, and then select the one they need in that particular situation. Seen that way, the process of word derivation is what creates one paradigm from another, rather than just one lemma form from another. That is why I think it's misleading to treat -en as a suffix: it doesn't actually create new lemmas. The true derivational part is the zero suffix, which is attached to a noun paradigm in order to form a verb paradigm. That the lemma of the noun paradigm has no ending while the lemma of the verb paradigm has -en isn't relevant; this could easily change just by selecting another lemma, since it's arbitrary which lemma you choose. —CodeCa t 23:44, 29 April 2013 (UTC)[reply]

For the record, I fully agree with CodeCat and Chuck Entz here. Longtrend (talk) 17:54, 1 May 2013 (UTC)[reply]

Note that the definition of desinence considers that a desinence is a kind of suffix. Lmaltier (talk) 20:07, 1 May 2013 (UTC)[reply]

I agree that endings like -en are suffixes from the linguistic point of view, but from the lexicographical point of view it's no help to anyone to have Category:German words suffixed with -en. That category currently has just 48 words, all of them infinitives, but in principle it could have the vast majority of German infinitives, plus all 1st and 3rd person plural preterite forms, plus all 1st and 3rd person plural past subjunctive forms, plus all the plural nouns in -en, plus all the dative plurals in -en, plus the -en form of every single German adjective. No one would be able to use a category like that for navigation. —An gr 21:22, 1 May 2013 (UTC)[reply]
It could be useful if it had sub-categories like Category:German infinitives ending with -en, Category:German 3rd person plural preterite forms ending with -en and so on, and just a few words which don't fall into those categories. But probably almost all words in the base cateegory would just have to be moved to (often several) subcategories. --80.114.178.7 00:26, 4 May 2013 (UTC)[reply]

[en] Change to wiki account system and account renaming

Some accounts will soon be renamed due to a technical change that the developer team at Wikimedia are making. More details on Meta.

(Distributed via global message delivery 03:31, 30 April 2013 (UTC). Wrong page? Correct it here.)

For the lazy... Ƿidsiþ 07:46, 30 April 2013 (UTC)[reply]

The developer team at Wikimedia Foundation is making some changes to how accounts work, as part of our on-going efforts to provide new and better tools for our users (like cross-wiki notifications). These changes will mean users have the same account name everywhere. This will let us give you new features that will help you edit and discuss better, and will allow more flexible user permissions for tools. One of the pre-conditions for this is that user accounts will now have to be unique across all 900 Wikimedia wikis.

Unfortunately, some accounts are currently not unique across all our wikis, but instead clash with other users who have the same account name. To make sure that all of these users can use Wikimedia's wikis in future, we will be renaming a number of these accounts to have "~” and the name of their wiki added to the end of their accounts' name. This change will take place on or around 27 May. For example, a user called “Example” on the Swedish Wiktionary who will be renamed would become “Example~svwiktionary”.

All accounts will still work as before, and will continue to be credited for all their edits made so far. However, users with renamed accounts (whom we will be contacting individually) will have to use the new account name when they log in. It will now only be possible for accounts to be renamed globally; the RenameUser tool will no longer work on a local basis - since all accounts must be globally unique - therefore it will be withdrawn from bureaucrats' tool sets. Once this takes place, it will still be possible for users to ask for their account to be renamed further here on Meta, if they do not like their new user name.

Oh, Christ, am I going to become Equinox~enwiktionary because of that prior Equinox who made one edit on Wikipedia back in 1843? Equinox ◑ 12:54, 30 April 2013 (UTC)[reply]

The Internet was steam-powered in those days. Ƿidsiþ 13:02, 30 April 2013 (UTC)[reply]

Is there a way to avoid becoming Astral~enwiktionary? I'm pretty sure there are other Astrals scattered throughout various Wikimedia projects. Perhaps by renaming my account before May 27? Astral (talk) 02:35, 8 May 2013 (UTC)[reply]

To both Equinox and Astral: Yes, if there is anyone on any other Wikimedia project with the same username, you will both be automatically renamed. You can seek renaming here now (or seek to have the existing other account(s) usurped in your favor now) or have this done at Meta after the change. bd2412 T 03:07, 8 May 2013 (UTC)[reply]

[en] Change to section edit links

The default position of the "edit" link in page section headers is going to change soon. The "edit" link will be positioned adjacent to the page header text rather than floating opposite it.

Section edit links will be to the immediate right of section titles, instead of on the far right. If you're an editor of one of the wikis which already implemented this change, nothing will substantially change for you; however, scripts and gadgets depending on the previous implementation of section edit links will have to be adjusted to continue working; however, nothing else should break even if they are not updated in time.

Detailed information and a timeline is available on meta.

Ideas to do this all the way to 2009 at least. It is often difficult to track which of several potential section edit links on the far right is associated with the correct section, and many readers and anonymous or new editors may even be failing to notice section edit links at all, since they read section titles, which are far away from the links.

(Distributed via global message delivery 18:21, 30 April 2013 (UTC). Wrong page? Correct it here.)

I see this has gone live. I think it's bad for usability. Finding an edit link now requires horizontal scanning, since its position is relative to the header's text length. It used to be easy: absolute far right. Equinox ◑ 18:27, 1 May 2013 (UTC)[reply]

I like this change; I find it easier to find the "edit" links when they're next to their headers; when they floated right, it took me longer to sort out which edit link went with which section on pages that had multiple immediately adjacent headers, e.g. pages with an L2 immediately followed by an empty Etymology 1 section immediately followed by a POS section, especially if only some of those headers were indented by right-floating Wikipedia boxes. (I've also had experience with this leftist placement for a long time, due to de.Wikt using it.) - -sche (discuss) 19:15, 1 May 2013 (UTC)[reply]

This looks like an improvement for new users who still read the headings. Also, I believe, for users of screen readers, because the edit link now follows the heading text in the document flow, instead of being in the previous section. It’s about time. —Michael Z. 2013-05-02 14:17 z

Okay, I've fixed (hopefully all of) the resulting breakages in TabbedLanguages, DefSideBoxes, AddDefinition, RhymesEdit, and VisibityToggles. Did anything else break that anyone's aware of? --Yair rand (talk) 19:17, 1 May 2013 (UTC)[reply]

Accelerated plurals/inflections (the green links) weren't working properly (instead of e.g. ===Noun=== you would get a lot of formatting gibberish in there), but now I can only see red links, and acceleration isn't available at all... Equinox ◑ 19:20, 1 May 2013 (UTC)[reply]

Fixed now? --Yair rand (talk) 19:24, 1 May 2013 (UTC)[reply]

Yep, thanks. Equinox ◑ 19:37, 1 May 2013 (UTC)[reply]

@Yair: Nope, I'm still getting the crap Equinox reported. I can provide diffs if you want. —Μετάknowledge^{discuss/deeds} 03:40, 8 May 2013 (UTC)[reply]

I don't know if it is just coincidence, but vandalism marked as (Mobile Edit) has gone up alarmingly since this change. SemperBlotto (talk) 14:58, 2 May 2013 (UTC)[reply]

I now see section edit links in the mobile view. Whatever was hiding them before doesn’tbseem to work with the new HTML.

Have good mobile edits increased as well? —Michael Z. 2013-05-06 16:16 z

Would it be possible for me to reverse this change? I don't think I like it all that much, it makes pages appear messier. —CodeCa t 02:11, 3 May 2013 (UTC)[reply]

Add .mw-editsection {float: right;} to Special:MyPage/common.css. --Yair rand (talk) 03:41, 3 May 2013 (UTC)[reply]

It seems net beneficial to me personally as I formerly made lots of errors clicking on the wrong section link and often found the edit link hidden by project link and similar boxes. It should be easier for newbies too. DCDuring TALK 15:31, 3 May 2013 (UTC)[reply]

Would it be possible to make this turn-off-able via WT:PREFS? Ƿidsiþ 15:37, 3 May 2013 (UTC)[reply]
It is possible: Someone just has to create a gadget with the code above. Dakdada (talk) 16:11, 3 May 2013 (UTC)[reply]