Wiktionary:Beer parlour/2013/February

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

← January 2013 · February 2013 · March 2013 →

Template Tiger[edit]

The famous Template Tiger has now been updated to include English Wiktionary. This is a tool on the German Toolserver, which allows you to analyze the usage of templates. For example, the most used templates here are {{t+}} (used in 371,300 places), {{t-}} (319,175) and {{term}} (79,627). For the latter, the most commonly used parameters are positional parameter 0 (used in 79617 places) , 1 (38947) and 2 (30680) and named parameters lang (67763) and tr (8958). Some less commonly used parameters (perhaps spelling errors?) for this template are leng (9 places), ang (8), lanf (4) and lanh (2). Template Tiger helps you to find those places. Take some time to check it out. --LA2 (talk) 14:16, 1 February 2013 (UTC)

Very interesting. — Ungoliant (Falai) 14:46, 1 February 2013 (UTC)
This is really helpful for finding template errors. Thanks a lot! -- Liliana 14:56, 1 February 2013 (UTC)
Awesome! BTW, the links relating to {{t+}} are broken, but you can correct the URLs manually by changing + to %2B; for example, the parameters report is at http://toolserver.org/~kolossos/templatetiger/template-parameter.php?template=t%2B&lang=enwiktionary. —RuakhTALK 15:14, 1 February 2013 (UTC)
Actually, a few things seem to be broken, but they're mostly workaroundable, so it's still very helpful. (And we can presumably contact the owner with requests for fixes.) —RuakhTALK 15:17, 1 February 2013 (UTC)
There’s something wrong. It says there are 468 transclusions of {{pt-verb form of}}, but [1] shows that there are thousands. — Ungoliant (Falai) 16:37, 1 February 2013 (UTC)
How exactly does this relate to Special:MostLinkedTemplates? Is it that Tiger only has first-level, direct uses of each template and MostLinkedTemplates includes indirect links? If so, what matters for exercising caution is MostLinkedPages. DCDuring TALK 17:10, 1 February 2013 (UTC)
Yes, the current numbers are much too low. The data will be updated. See this discussion. --LA2 (talk) 09:59, 2 February 2013 (UTC)
Doesn't Template Tiger work on the unexpanded wikitext of the XML dump? DCDuring TALK 11:46, 2 February 2013 (UTC)

A bug in the script has been fixed, and the data for en.wiktionary has been imported again. It works now. --LA2 (talk) 17:52, 20 February 2013 (UTC)

Lucifer is back[edit]

User:LightningNightling is evidently Luciferwildcat, Gtroy, Acdcrocks, etc. i.e. another block evasion. Equinox 00:47, 2 February 2013 (UTC)

  • Yes, I blocked him as being WF - then realised who he really was. Still blocked though. SemperBlotto (talk) 08:08, 2 February 2013 (UTC) (talkcontribs)? - -sche (discuss) 09:18, 7 February 2013 (UTC)
Could very well be. LW has used (talkcontribs), (talkcontribs), (talkcontribs), and (talkcontribs) (I keep a database for such trivia), which also show on Geolocate as AT&T Internet Services in Pleasanton. This IP missed a couple of things here and there that LW would know, so I'm not absolutely certain- but it's easy to overestimate his memory for details. Chuck Entz (talk) 15:42, 7 February 2013 (UTC)
Kinda sad that somebody so passionate about editing has been blocked. I haven't followed this saga; is he profoundly inept? Does he add intentionally bogus entries? Is he just an unmitigated ass who creates busywork and unhappiness for others? Curious, -- Eiríkr Útlendi │ Tala við mig 16:50, 7 February 2013 (UTC)
The same could be said of a certain anonymous officionado of all things supernatural and or Japanese... Still, it's pretty murky. Yes, he's somewhat inept, but not profoundly. The main problem is that he has really bad instincts when it comes to lexicography, accompanied by impatience with details and a prolific output. When he's careful, he edges up into the mediocre range.
He's spent so much time in rfd and rfv that he's learned ways to game the system. They don't accomplish much in the long run, but they do really drag things out and annoy people. He's been the target of some rather abusive treatment by one or two of the admins, and specializes in taboo/offensive subject matter- both of which have seriously clouded the issue. In some ways, the quality of the entries isn't the worst part- it's the wasted time, aggravation and divisiveness involved in dealing with them. Chuck Entz (talk) 10:12, 8 February 2013 (UTC)
Said officionado of all things supernatural is also proficient in Mandarin. I don't know Mandarin so to all the editors who do, please be on the lookout for Mandarin entries from IPs that trace to London. Sometimes there are WP links or interwiki WT links that do not have corresponding pages. --Haplology (talk) 10:26, 8 February 2013 (UTC)
I've started blocking him as soon as I recognize him, which is usually within a day or two of the first edits. He's been so consistent in the nature of his edits, in the face of explanations, warnings, reverts, deletions and blocks over almost 2 1/2 years, that I don't see the point in waiting to confirm that he hasn't cleaned up his act. That way we don't have to make the blocks for as long and risk excluding others with the same ISP who might get assigned the same IP later on. Chuck Entz (talk) 22:56, 9 February 2013 (UTC)
The problem with LWC is that he didn’t/doesn’t give a shit about citability nor formatting. — Ungoliant (Falai) 05:30, 9 February 2013 (UTC)
User:AVerSiMeDejan. Anyone confirm? — Ungoliant (Falai) 05:30, 9 February 2013 (UTC)
I'm practically certain. Bad Spanish translations; adding to entries that LightningNightling (another Lucifer) edited recently before being blocked; and general obsession with penis. Equinox 05:33, 9 February 2013 (UTC)
Why can't he understand that he is simply unwanted? @Chuck: Geolocate is often a bit off in my experience. From finding out what appears to be his true identity, I believe that he actually lives in San Francisco, a city somewhat near Pleasanton according to Wikipedia. —Μετάknowledgediscuss/deeds 14:36, 9 February 2013 (UTC)
I specifically mentioned Geolocate for that very reason. The fact that Geolocate traces back to the same place is often significant, though- whether that place matches the actual physical location or not. Chuck Entz (talk) 22:33, 9 February 2013 (UTC)
Just to add a smidgen of information from my own experiences here, Pleasanton seems to be a major AT&T network (former Pacific Bell/SBC) routing area for San Francisco Bay Area internet traffic. In the real world sense, the two cities are at far ends of the greater Bay Area region. Bumm13 (talk) 13:21, 15 March 2013 (UTC)
  • Can we start a list of all the obvious Lucifer socks, please? If for no other reason than some have been screwing with other projects? Purplebackpack89 (Notes Taken) (Locker) 21:50, 9 February 2013 (UTC)
    • In general, that's a bad idea. We deleted such a list of WonderFool socks because it gave undeserved recognition and was missing a few hundred, anyway. LW has little in common with the typical notoriety-motivated vandal, though, so it might be ok in this case. I know of 15, including IPs Chuck Entz (talk) 22:33, 9 February 2013 (UTC)
Back again: User: Equinox 21:19, 15 February 2013 (UTC)

Removing restoree from rollback message.[edit]

Our current default rollback message is:

Reverted edits by $2, restoring last version by $1. If you think this rollback is in error, please leave a message on my talkpage.

Rukhabot has gotten a few messages from people complaining about rollbacks that restored Rukhabot-edited versions. These editors are apparently confused by the edit-summary, and mistakenly believe that Rukhabot has reverted their edits. This confusion seems needless; it's useful for MediaWiki:Rollback-success to mention the restoree, in case it's not who the rollbacker thought it was (e.g., in cases where two different IP addresses make bad edits, it's easy to accidentally roll back only the second, thinking that we're rolling back to an earlier version than we actually are), but I don't see the benefit of MediaWiki:Revertpage doing so.

Any thoughts?

RuakhTALK 06:50, 2 February 2013 (UTC)

I think it's because it says "by". Is there another way we could phrase it so that it's less ambiguous? I wouldn't object to removing it altogether but if we can rephrase it instead I think that's preferable. —CodeCat 11:40, 2 February 2013 (UTC)
Unneeded, sometimes helpful to the rollbacker. If it's causing any confusion, I'd support removing that bit. —Μετάknowledgediscuss/deeds 17:11, 2 February 2013 (UTC)
Support. In addition to the problem you mention, the message is too long. — Ungoliant (Falai) 17:12, 2 February 2013 (UTC)

Periphrastic forms in inflection tables[edit]

This has come up a few times before but I'm not sure if it has ever been discussed thoroughly. Many languages form certain tenses, moods and voices of verbs using auxiliary verbs and other additional words. All Germanic languages do this, using have to form the perfect or past, become to form the passive, shall to form the future and so on. Romance and Slavic languages also frequently use such constructions. For some languages, the conjugation tables include such forms (Serbo-Croatian and Slovene kupovati, Finnish ostaa), while others focus only on the individual word forms (Dutch kopen, Swedish köpa). Some take a compromise approach, listing only the "formula" for producing the compound tenses but not each form individually (French acheter, Latin emo). I think both approaches have advantages, but in the case of the Slovene verb, the only two finite verb forms that are synthetically formed are the present and imperative, all others are analytical. This leads to a table that is full of forms that don't actually contribute much to the user's knowledge of that verb in particular, and act mostly as a distraction from the more important details (the word forms). Personally I would prefer it if descriptions of such analytical forms are kept to grammars (in an appendix or on Wikipedia) and that the tables only focus on words. What do others think about this? —CodeCat 17:05, 2 February 2013 (UTC)

My preference is that analytical forms be kept in the table if they don't cause too much trouble, like the past tense at זײַן (zayn). Especially semi-analytical forms, where it is two separate words, but it might be difficult to explain how one of them is chosen (like the subjunctive at weli). The Latin solution is an excellent one because the compound passive tenses would take up so much room, but are so easily explained. With Slovene, you might find that the most successful way to present compound verb forms is to link to an explanatory appendix, but I don't mind leaving them in by what I've seen at kupovati. —Μετάknowledgediscuss/deeds 17:18, 2 February 2013 (UTC)
I agree. And I think that tenses traditionally shown in conjugation tables in the language should be kept. But not tenses never shown in such tables, such as, in French, être en train de + infinitive, or être sur le point de + infinitive. Lmaltier (talk) 19:03, 2 February 2013 (UTC)
I'm not sure if that is very helpful. It's rare to find agreement on such things in all materials, and many languages don't even have much material to base such a tradition on. Besides, blindly following other authorities isn't always in the best interest of Wiktionary. Sometimes we can do better. —CodeCat 19:23, 2 February 2013 (UTC)
But people familiar with the usual treatment of that language ought to feel comfortable with our treatment of it when it comes to inflection. Lmaltier is quite right. I assume that Slavic standards are somewhat codified in the literature, although I'm not personally aware of it. —Μετάknowledgediscuss/deeds 03:00, 3 February 2013 (UTC)
A good example of following what's expected is the choice of which form to make the lemma. Latin and Greek references go with first-person singular present indicative, while modern Romance languages tend to prefer the infinitive- which is what we do, also. When we have reasons to depart with the norm, we do (for example, omitting "to" in English verbs), but mostly we stick with the consensus of modern works on the specific languages. We might as well do the same when it comes to inflection tables- in cases where such a consensus exists, of course. Chuck Entz (talk) 03:23, 3 February 2013 (UTC)

Smart quotes in Template:l[edit]

Smart quotes were replaced with straight quotes in {{term}} in this diff per Wiktionary:Grease pit/2012/July#Template talk:term#Smart quotes. Ruakh said

Don't be bold. I, too, support straight ASCII quotes, but this also raises the question of other templates that use curly quotes ({{l}} and {{onym}}, obviously; also reference-templates), as well as the many entries that use curly quotes in their own wiki-text. Are we establishing a policy of using straight quotes? A policy of using straight quotes except in citations? Should we have a bot modify all existing curly quotes? I don't think it makes sense to modify {{term}} without thinking these through.

Quite right too, {{l}}, {{onym}} and all the new l templates like {{l/en}} should use 'straight' quotes too. But I don't think the previous discussion is enough for a clear, irrefutable mandate on this. So... here I am talking about it again. Mglovesfun (talk) 18:31, 3 February 2013 (UTC)

  • Jeezum crow, don't we have better things to worry about around here than whether our quotation marks are straight or bent? Support not giving a flying fuck. —Angr 19:30, 3 February 2013 (UTC)
I'd prefer that we consistently use the straight quotes, because it facilitates better formatting of alternative glosses, allowing each separate gloss to appear in its own enclosing pair of straight quotes. Whether this comes up as often with {{l}} and {{onym}} as often as with {{term}} I don't know, but consistency helps. term (gloss1", "gloss2) looks better than l (gloss1", "gloss2), though my eyesight is poor enough that I can barely tell the difference. I for one have vastly prefer the ease of typing straight quotes to the aggravation of getting the right edittools character set open and using it. DCDuring TALK 20:36, 3 February 2013 (UTC)
Just a note... I don't think {{l/en}} should have any quotes at all. —CodeCat 20:44, 3 February 2013 (UTC)
I oppose having templates generate straight quotes. Curly quotes make the entries look more professional. I don’t mind if they are changed, as long as they are properly tagged with spans so I can edit my CSS to fix them. Oppose having a bot modify all existing curly quotes. Support giving a few flying fucks. — Ungoliant (Falai) 20:51, 3 February 2013 (UTC)
I agree with Ungoliant. It's not very important when you look at the screen, but it's more important when you print the page. Why deliberately trying to look less professional? However, if people prefer straight quotes for their own edits, they should not be rebuked. On fr.wikt, we routinely use «  », and this is not an issue. Lmaltier (talk) 21:03, 3 February 2013 (UTC)
I support straight quotes, but more than anything else I support creating an entry for flying fuck (as I write this, it's a redlink). —Μετάknowledgediscuss/deeds 21:09, 3 February 2013 (UTC)
Redirected to the only phrase it's ever used in. —Angr 22:34, 3 February 2013 (UTC)
I see we're about as likely to come to consensus about this as about our logo... - -sche (discuss) 22:31, 3 February 2013 (UTC)
@Codecat: I don't know where quotes come up in templates besides in glosses. As it stands now {{l/en}} doesn't support glosses and is not intended to, right?
@Ungoliant: I like the ease of input, but am by no means opposed to curly quotes for appearance. Once the mass (How big?) of existing straight quotes was converted, I suppose any costly intelligent parser operations wouldn't be a problem as they would not be run very much. Would conversion require "costly intelligent parser operations"? DCDuring TALK 22:39, 3 February 2013 (UTC)
@DCDuring: Exactly, but Mglovesfun did add glosses and genders to some of the templates, which I don't really agree with. —CodeCat 01:23, 4 February 2013 (UTC)
Re: "where quotes come up in templates besides in glosses": One case is reference templates for citing other dictionaries (and whatnot), many or most of which wrap the entry-title in quotation marks. And it wouldn't surprise me if some quotation templates wrap titles of short stories, news articles, and so on in quotation marks. —RuakhTALK 02:25, 4 February 2013 (UTC)
Are there uses of those that can involve something like the multiple glosses, which can easily occur within {{term}} in etymologies, where it is sometimes informative to know that, say, the Latin etymon had several of the modern meanings for the English derived term? DCDuring TALK 03:45, 4 February 2013 (UTC)
Let's be honest, no one is going to print out dictionary entries with the kind of structure we have (it's sad, but true). So for what it's worth, it makes no difference whether we use straight quotes or smart uotes. And given that the straight quotes are infinitely easier to type in, I'd prefer using these. -- Liliana 05:25, 4 February 2013 (UTC)
The “let’s be honest” argument implies that anyone who disagrees with you is necessarily lying. Also a pleasure to see it followed up with at least one self-evident fallacy stated as fact. Michael Z. 2013-02-04 17:32 z
Sorry for my glib response. I’ll address your assumptions and assertions.
  • Typographic apostrophes and quotation marks are clearly discernible in the desktop and mobile computers that I use. I have been using them in my web publishing work for over a decade, and now increasingly in my routine email, discussion, etc.
  • Wiktionary and other large open-source projects do get re-purposed in print, and in many other forms that we cannot predict. We’re building a database of knowledge, not just a series of web pages.
  • No one is making you do anything. If entering typographical characters is hard, just use neutral apostrophes and quotes, and allow someone to improve the presentation in the future. You can also enter Sampa instead of IPA, or sound-out respelling, or omit the pronunciation until someone else improves an entry. Willfully dumbing down the typography in our project is contrary to our collaborative operating principals of constant and perpetual improvement.
  • Some editors have taken the care and time to use correct typography. They are willing to edit your entries and bring them to higher standards of writing, spelling, grammar, punctuation, etc. Deciding to undo this work, and to use templates and bots to bring the quality down, is a slap in the face for these editors.
  • Using good typography is not difficult, certainly not “infinitely” harder. Modern browsers support smart-quotes typing in text input fields (it’s built in to Mac OS and Safari, there are extensions for Firefox and Chrome). Typographic marks and diacritics are accessible from the default Mac English keyboard layout, and sort-of usable on Windows and even DOS. Or you can paste text in from a word processor. Or you can make a custom keyboard layout for yourself on Mac OS or Windows.
  • Correct typography doesn’t have to hurt searching. Typographic marks don’t interfere with search in Google or on the page in Safari (searching for Let's finds let’s). Strangely, Chrome is fine with quotation marks but bamboozled by apostrophes. Firefox has some catching up to do – its search isn’t even case-insensitive (should we stop using capital letters?). Smarter use of search terms will always get better results anyway.
  • Professional-quality typography makes a difference. Directional quotation marks are clearer and easier to read, especially in nested quotations. Our examples should be good examples. Craftsmanship, care, and professionalism give the reader confidence in the source, just like other aspects of writing, grammar, and spelling.
  • We are going to use standard English typography anyway, according to our guidelines for reproducing quotations.
 Michael Z. 2013-02-07 19:58 z

Excuse me. Where is the policy favouring neutral typewriter-style quotation marks over normal typographical quotation marks, and what is the justification for it?

(Also, if we are to adopt the quaint pre-ASCII conventions of manual typewriting, why aren’t we also using the figure 1 to represent the Latin lowercase letter l? Perhaps we should use the combining underscore in place of italics, because my cousin reads Wiktionary on a teletype and everything but the italics displays perfectly for her.) Michael Z. 2013-02-04 17:02 z

  • I'll add my voice to those who vastly prefer the more professional look of regular ‘bent’ quotation marks, as opposed to the straight ones. Ƿidsiþ 16:54, 7 February 2013 (UTC)
I’d like to design this dictionary on the basis of reason and evidence, not just personal preference. There are a number of reasons to allow typographical apostrophes and quotation marks, and also some more reasons to prefer them. It confounds me that participants in an English-language project which uses Polytonic Greek and cuneiform, and has rejected Sampa in favour of IPA on every page, can’t tolerate the use of standard English orthography!
If after a proper discussion we decide that the advantages of typewriting outweigh those of typesetting, then I will abide by it.
But it’s completely unacceptable to me that despite the lack of any policy, an ad hoc majority is using templates and other powerful technologies to dumb down the basic English in this project, after some of us have put in the care and the labour to write and edit text to a good standard. Michael Z. 2013-02-07 18:34 z
I would very much like it if we could make it policy that each individual gloss be enclosed in its own quotes. There are numerous cases where multiple glosses are appropriate, indeed almost essential, in etymologies. I find the lack of such separation to be not just a matter of appearance, but one of avoiding user confusion. A bot that inserted quotes, smart or dumb, wherever there was a comma in a gloss within {{term}} and left a {{rfc}} of some kind would be quite handy for correcting the deficiency and would address the smart quote problem that I personally have. To me this matter trumps the "smart quotes" matter as it applies to {{term}}. No one has yet identified any other templates that are subject to this internal quotes usage. DCDuring TALK 18:55, 7 February 2013 (UTC)
These quotation marks are typographic furniture surrounding the glosses themselves. Wouldn’t it be better to add parameters gloss2, gloss3, &c? Then the data fields contain only data, and formatting is applied mechanically and consistently. Michael Z. 2013-02-07 20:09 z
Sure. It'll just make contributors think twice before adding multiple glosses. We don't want to encourage that kind of thing. And we certainly need our widely transcluded templates to be rendered more complicated and slower, by providing for more parameters. It worked out so well for {{context}}. DCDuring TALK 23:33, 7 February 2013 (UTC)
Uh huh, because all of our templates are so accessible right now, and the newbs are just clamouring for multiple glosses. Mixing North-American–style punctuation in with data is a small price to pay to keep the UX so simple and easy. Michael Z. 2013-02-08 15:50 z
I have to agree with Mzajac. Manually inserting your own double-quotes inside gloss= is a terrible idea. Instead of your bot that adds quotation marks inside the gloss, I'd almost rather create a bot that leaves angry messages on talk-pages of people who put quotation marks inside the gloss. :-P   And anyway, I don't see a problem with (say) either « ציפה (tsipá, to expect, predict; to coat, cover) » or else « ציפה (tsipá), which can mean either “to expect, predict” or “to coat, cover” ». —RuakhTALK 02:37, 8 February 2013 (UTC)

CSS solution[edit]

If you want neutral quotation marks in {{l}}, just add the following to your user CSS (e.g., in User:Mzajac/vector.css for me). Easy-peasy.

.mention-gloss-double-quote { display: none; }
.mention-gloss:before, .mention-gloss:after { content:'"'; }

Caveat: won’t work in pre-2009 versions of MSIE.[2]

For British-style single quotation marks, add this instead:

.mention-gloss-double-quote { display: none; }
.mention-gloss:before { content: '‘'; }
.mention-gloss:after { content: '’'; }

By the way, anyone remember this 2008 discussion about redundant quotation marks in {{term}}: Misuse of markup and CSS? This is the solution for anyone who wants their quotation marks in the style of the Americans, the British, or the typewriters. We can correct and simplify the content and HTML, while giving users choice. Easily turned into a WT:PREFS gadget too, I suppose. Michael Z. 2013-02-14 22:20 z

Even Americans use single quotes when glossing foreign words, at least in linguistics. —Angr 06:40, 15 February 2013 (UTC)
True, and I would be okay for single quotation marks as the default form. It wouldn’t be the same as choosing an international style though, because WT:QUOTE#How to format a quotation mandates double quotation marks, and {{term}} and some other templates now sport neutral quotation marks. Michael Z. 2013-02-15 18:02 z


It just occurred to me that I benefit from Wiktionary so much more that I can ever return. Therefore I want to say thanks to every single person who is and has been working on this great project. Your work is highly appreciated. Caudex Rax ツ (talk) 13:57, 4 February 2013 (UTC)

WOTD (old)[edit]

We're running dangerously low on Word of the Day nominations. Quick, everyone suggest something! (PS, Astral, if you get tired of setting WOTD, let us know.) - -sche (discuss) 19:18, 4 February 2013 (UTC)

WP links[edit]

WP linking was recently brought to my attention over at ニゴロブナ (a type of carp). I'd added WP links to both the JA and EN Wikipedia articles, in imitation of other JA entries I'd seen, and on the assumption that users landing on the ニゴロブナ page here on the EN WT could arguably be assumed to be English readers interested in Japanese. Another editor removed the EN WP link, and commented that EN WP links should only go in EN entries.

However, this raises some issues.

In usability terms, linking from a JA entry to an EN WP article provides the user a clear and easy way to get more information about the JA term in an English encyclopedia article. Including EN WP links only on EN term entries a) increases barriers to finding relevant information, b) requires that the relevant EN entry exists, and c) requires that users know to click through to that EN entry to find the EN WP link, or alternately to click through to the JA WP entry and then find the EN WP link from there.

Case in point, there is no single English term that corresponds to JA ニゴロブナ -- this fish is the w:Carassius auratus grandoculis, a type of carp, for which we have no Carassius auratus grandoculis entry here on the EN WT.

Is there any strong objection to including EN WP links in entries for other languages? If so, why? Granted, in cases where an entry in another language shares the same page as the EN entry, it does make sense to have the EN WP link only in the EN entry. But for cases where the script of the lemma guarantees that there will never be an English entry on that page, such as with almost all Japanese lemma forms, I'm quite in the dark as to why including EN WP links would be frowned upon. I can see no harm in doing so, so long as the links are correct, and including such links makes the entries more complete and makes relevant information easier to find. Discoverability is a key concept in UI design, and I think we might overlook that sometimes.

Curious, -- Eiríkr Útlendi │ Tala við mig 20:52, 4 February 2013 (UTC)

As in interim and possibly less contentious measure in the narrow case of taxonomic name entries, which are supposed to be Translingual and often appear in running text in languages that do not use Roman script, let me recommend {{taxlink|Carassius auratus grandoculis|subspecies}}. In-line links to sister projects would also work.
I also have not understood why it should be essential that users be compelled to go to an entry in an encyclopedia which they are likely not to understand. Is this supposed to be punishment for not yet knowing Japanese or not wanting to? This is particularly odd where the language in question does not have as good an entry or indeed any entry at all. The "prohibition" can most charitably be viewed as a consequence of particular assumption about who the user of the entry might be and a patronizing assumption of what would be best for them. DCDuring TALK 21:12, 4 February 2013 (UTC)

It seems obvious to me that links to WP are bridges to the encyclopedic world. And that, here, they should preferably be to the EN WP, wherever possible, as this is the language chosen by the reader of the en.wikt page. And I disagree with in cases where an entry in another language shares the same page as the EN entry, it does make sense to have the EN WP link only in the EN entry. Each language section should be considered in isolation. Lmaltier (talk) 21:43, 4 February 2013 (UTC)

Do you guys want to know about the Japanese word ニゴロブナ or about the English term nigorobuna? What punishment? What are you people talking about? --Anatoli (обсудить/вклад) 01:13, 5 February 2013 (UTC)
Arrowred.png Ah, and your comment reminds me that some folks use tabbed languages -- in which case, each entry appears as a separate "page" for UI purposes. So yes, I agree that "each language section should be considered in isolation.". -- Eiríkr Útlendi │ Tala við mig 00:11, 5 February 2013 (UTC)
Note that if there isn’t an entry for a taxonomic name, {{taxlink}} should be used, which links it to Wikispecies. — Ungoliant (Falai) 22:36, 4 February 2013 (UTC)
I tend to agree with Eirikr regarding WP links. - -sche (discuss) 22:37, 4 February 2013 (UTC)
I was the second person who removed the WP links and I left an edit summary Eirikr was referring to. I strongly object to linking foreign language entries to English Wikipedia. This would be a bad precedent when all foreign language entries may get English Wikipedia links. Besides, in this case we have an English entry with an English Wikipedia link and the Japanese Wikipedia page has an English page linked. In any case, people can find encyclopaedic info in another language (here English) if they want to, why should Wiktionary be concerned about it? There's plenty of linguistic info, which already makes entries look cluttered or encyclopaedic to many users. The entry should be concerned with the target language info, IMHO. If the current definition is confusing or insufficient, it can be expanded, there's also "usage note", if just a definition is not enough, example sentences, whatnot. --Anatoli (обсудить/вклад) 01:08, 5 February 2013 (UTC)
Oh, this is my fault, eh? :) Well, I think Anatoli said it much better than I ever could, and that combined with DCDuring's taxlink template, we should be fine. Lmaltier's assumptions are rather faulty, I'm afraid. —Μετάknowledgediscuss/deeds 02:22, 5 February 2013 (UTC)
No-no, sorry, if I made it sound like it was all your fault :). I meant that you deleted the WP link, Eirikr reverted it, I put it back with an edit summary because I thought your edit was correct. --Anatoli (обсудить/вклад) 02:39, 5 February 2013 (UTC)
We should make an exception for when the English WP has an entry about the word itself (w:Wissenschaft, w:Saudade, w:Sehnsucht, w:Hiraeth). — Ungoliant (Falai) 02:48, 5 February 2013 (UTC)
I agree with Ungoliant for terms like w:Nemawashi or w:Wabi-sabi, and in general I don't think it's so bad to use WP links as long as they are used very sparingly. In rare cases, some terms have a cultural context that you need to know about in order to understand how how the word is used, like 法螺貝 which I think benefits from a link to w:Horagai (although the clutter at 法螺貝 simultaneously illustrates the dangers of adding WP links.) You could put that in a usage note, but it's already at WP, and the link takes just as much space or less. Sometimes the WP page just has really helpful information that would be a little challenging to find otherwise, like w:Radical 94 which I added to けものへん, although maybe it belongs at 獣偏 if anywhere. --Haplology (talk) 03:32, 5 February 2013 (UTC)
I would agree to having links to WP, if there is no English word for it (or an entry), like in the case of けものへん (yes, it's better to move it to 獣偏). Our strict policy often disallows important terms, even linguistic ones to be created and kept safely (or there is disagreement on what SoP and word means). Perhaps real, not numeric names of Chinese character radicals should be allowed - e.g. kemonohen, dog radical or radical 94. If not, then it should be OK to have WP links in the Japanese entry or any other external reference to this term. Disagree with Ungoliant about linking to WP if they describe the word. We could link on the loanword, couldn't we? Why should we link to the English WP on Wanderlust#German or 空手#Japanese, if we could do the same on wanderlust#English or karate#English? --Anatoli (обсудить/вклад) 04:59, 5 February 2013 (UTC)
Of the four examples I gave, only Sehnsucht was loaned into English. — Ungoliant (Falai) 05:09, 5 February 2013 (UTC)
So, if there is an English article about this loanword, one can link Sehnsucht#English or sehnsucht#English to the English WP, not Sehnsucht#German, no need to link to English WP articles about longing, yearning, etc. --Anatoli (обсудить/вклад) 05:20, 5 February 2013 (UTC)
I see what you mean. By the way my example about horagai was wrong. It seems that the shell was named after boasting, not the other way around, and the expression is literally "blowing a boast" but "blow" could mean "boast" so it's "boast a boast"... which all gets very confusing. There are moments when writing an entry that it just feels too short, as with defining 侘び as "refined simplicity," which reminds me of describing humans as "mostly harmless," lol. I'll leave the WP links on the English entries whenever possible. --Haplology (talk) 07:55, 5 February 2013 (UTC)

Arrowred.png Reading and re-reading this thread, I am left still uncertain about quite what the objections are to including EN WP links in entries for other languages. As best I can tell, the objections seem to boil down to:

  1. clutter
  2. a concern that links to content in English somehow do not belong in an entry for a term in some other non-English language
  • Regarding clutter, this is a concern I can begin to understand, but I think it's moot in this case. WP links as I've been adding them are aligned on the right of the page, same as images, in a block 250px wide. The text of the entry itself is all left-aligned. This layout is visually pretty clear -- read the entry on the left, look at the stuff on the right for additional information. Haplology mentions the Japanese 法螺貝 entry. This includes more than one EN WP link as the JA term itself has more than one meaning: either an aquatic mollusc, or the mollusc's shell as used as a horn. Both senses are distinct, and there is quite a bit of encyclopedic information about each that does not belong in the WT entry -- that's what the Wikipedia articles are for, hence the links.
  • Regarding links to English content in non-English entries, I confess I remain baffled as to why this matters. This is the EN WT; ostensibly, our target audience consists of English readers, who are also ostensibly interested in the terms and/or languages in the entries themselves. We may link to the WP article in the source language to show the term in context in that language; we may link to the WP article in English, our target language, to provide the user with more information in English, i.e. the language that we can safely assume that the user reads.

Usability is important to me, and poor usability reflects poorly on Wiktionary. Many of our feedback comments from frustrated users ultimately boil down to poor usability -- where it is not obvious to the user where to go for the information they are seeking.

Anatoli comments, "Besides, in this case we have an English entry with an English Wikipedia link and the Japanese Wikipedia page has an English page linked. In any case, people can find encyclopaedic info in another language (here English) if they want to, why should Wiktionary be concerned about it?" As I initially described, requiring the user to click through various other non-obvious links in order to find information is poor usability. Why not just include the EN WP link within a given entry? This way, it is obvious to the user where they can go to read more. Moreover, assuming that the user does click through to the JA WP article, if they don't read JA, they might be overwhelmed or intimidated and fail to notice the "English" link on the lower left. Again, poor usability, where we wind up frustrating users instead of providing them with, and clearly leading them to, relevant information in formats they can absorb.

Wiktionary is ostensibly about providing users information regarding the entries we editors include here. To that end, entry completeness should also be one of our goals as editors. Anatoli asks, "Do you guys want to know about the Japanese word ニゴロブナ or about the English term nigorobuna?" I suspect that many users look up terms in a dictionary in search of a definition. If a user stumbled across the Japanese term ニゴロブナ and looked it up here in Wiktionary, and all they found was "1. nigorobuna (Carassius auratus grandoculis)" and a link to just the JA WP article, as in this former incarnation of that entry, they would be right to be frustrated. There's no mention here that this is even a fish, let alone what kind or where it's found or what it looks like. Given that this EN WT user is looking for a definition written in English, providing only a link to the JA WP article is less than helpful. Given also that this term is Japanese, and ultimately about a fish only found in Japan, it's a bit strange and unintuitive to expect the user to click through to the English entry to find out what this word means. Again, poor usability.

Another of Anatoli's comments suggest that perhaps he's concerned about *all* entries requiring EN WP links ("this would be a bad precedent when all foreign language entries may get English Wikipedia links"); please note that that is not what I'm proposing at all. Requiring this would mean a substantial amount of additional work, and I suspect that many WT terms have no corresponding EN WP articles.

However, if EN WP links are optional, I fail to see what possible harm there could be even if all entries in all languages included EN WP links (again, this assumes that the links are correct and that the articles are relevant). Providing an easy way to link to related information was one of the core reasons for inventing HTML in the first place. Effectively erecting barriers to finding relevant information by removing such links strikes me as completely antithetical to the underlying reason for Wiktionary's very existence: helping users find information. I'm deeply concerned here that practicality is being sacrificed for ideology.

Apologies for the tome; I felt I needed to articulate my concerns. -- Eiríkr Útlendi │ Tala við mig 18:23, 5 February 2013 (UTC)

PS: To clarify, Anatoli, please do not take this as any personal attack. I think very highly of you as an editor, and I appreciate your work. I just disagree with you about EN WP links.  :) -- Eiríkr Útlendi │ Tala við mig 18:34, 5 February 2013 (UTC)

That's OK. I've lost interest. I have expressed my opinion. --Anatoli (обсудить/вклад) 02:16, 7 February 2013 (UTC)
What could be the objective of WP links? Certainly not the objective of giving complementary information about the word, as we should provide this information here, and Wikipedia does not provide information about words, anyway. The only possible objective is to provide a bridge to the encyclopedic world, as Wikipedia is not about words, but is an encyclopedia organized by topic. And the preferred language of en.wikt readers should be assumed to be English. This is why links to EN WP make sense. Lmaltier (talk) 18:47, 5 February 2013 (UTC)
Not sure where to put this: it's in response to the original question. I roughly agree with Lmaltier's "objective is to provide a bridge to the encyclopedic world, as Wikipedia is not about words, but is an encyclopedia organized by topic. And the preferred language of en.wikt readers should be assumed to be English". However, I think it would be useful for those who do read the foreign language to be able to read its WP article on the topic. So I don't see anything wrong with linking to either or both, such as exist, and some benefit. That said, I think two {{wikipedia}} boxes will be ugly, so putting the links as bullets (or even combined into one bullet) sub "External links" seems the way to go.​—msh210 (talk) 04:04, 6 February 2013 (UTC)
  • I disagree with the categorical statement that Eirikr quoted (or paraphrased?), that "EN WP links should only go in EN entries"; but certainly the bar should be much higher for enWP links than for FL WP links, and the former should go in ===External links=== sections, not in right-floating boxes. (Setting aside inline links, of course, which are another matter entirely.) The entry for French chien (dog) should link to w:fr:Chien, but should not link to w:Dog. This is because, firstly, part of the reason for linking to Wikipedia is to cover uses that we would never include (such as the name of a certain video game character), and obviously only w:fr:Chien is relevant to that; and secondly, no one needs the link to w:Dog. English-speakers already know about dogs. The link to w:fr:Chien offers more cultural context (does it describe the chien as a wild animal or a domesticated one? as a pet or as food?), more images of chiens (compare the images at w:fr:Chien with those at w:ja:イヌ), more sentences that use the word chien (as well as chienne and chiot), and so on. A link to w:Dog would offer none of these things; it would actually discourage the recognition that the French word is not perfectly equivalent to the English translation we've provided.
    In the case of [[ニゴロブナ]], where the referent is a Japanese fish (to the point that even enWP gives the Japanese name for it), I think the link to enWP is very helpful, and should be included in an ===External links=== section, orsince the enWP article's title is the scientific name, perhaps we could simply linkify that in the def. But again, not in a right-floating box.
    RuakhTALK 05:33, 6 February 2013 (UTC)
    • That makes a good deal of sense.​—msh210 (talk) 05:44, 6 February 2013 (UTC)
      • Yes, that makes sense, especially for imperfect entries. But providing a WP link as the single bridge to the encyclopedic world (and excluding all other encyclopedic external links) also makes sense. Some readers may get a Wiktionary page while they actually look for encyclopedic information. I don't think that providing both links is a problem, but I would avoid floating boxes too. Lmaltier (talk) 06:56, 6 February 2013 (UTC)

Arrowred.png I do prefer the right-aligned floating box myself, but I'm happy to admit that that's mostly a subjective aesthetic perspective. For consistency's sake, though, I think we should put any and all WP links in the same place and in the same format -- I'm concerned that using the floating box for the source-language WP link, while putting the target-language (i.e. EN) WP link at the bottom, would be potentially confusing, especially for longer entries where the EN WP link might be off the bottom of the screen. Since WP is part of this whole MediaWiki / WikiMedia thing, ===External links=== doesn't seem quite right, but I'd be fine with ===See also=== instead. Would this version of the 煮頃鮒 (nigorobuna) page be acceptable? -- Eiríkr Útlendi │ Tala við mig 17:46, 6 February 2013 (UTC)

"External links" is where to put WP links, not "See also", per policy.​—msh210 (talk) 21:04, 6 February 2013 (UTC)
  • Thanks msh210, I'd missed that, and the only places I can recall seeing * {{pedia|something}} used have been all in ===See also=== sections. I've changed the header in the 煮頃鮒 entry accordingly. -- Eiríkr Útlendi │ Tala við mig 21:30, 6 February 2013 (UTC)


I have reformatted this page as a table, and added some information that I couldn’t find anywhere in one place. Please check my work, especially the language & script codes we use for Chinese and Korean languages. Thanks. Michael Z. 2013-02-04 23:57 z

I checked all the codes and automated all of the script codes that were feasible to do so. Should we also list indices for appendix-only conlangs here? We have quite a few, in various states of completion. —Μετάknowledgediscuss/deeds 02:51, 5 February 2013 (UTC)
I would segregate them into their own section at the bottom of the page, but yes, I think it's OK to have them in this centralised index of indices. - -sche (discuss) 05:37, 5 February 2013 (UTC)
To be clear, if others want to integrate them into the main list, I'm OK with that, too. - -sche (discuss) 05:33, 6 February 2013 (UTC)

uncountable noun and countable noun - get deleted, can we include such terms in CFI?[edit]

In my opinion, we should not delete these terms but "deletionists" seem to be more active in this period. I'm not so worried about my own input in these entries, in my opinion, they should be kept and RFD and CFI should be reviewed to allow terms included in other authoritative dictionaries. I suggest to keep grammar terms, medical terms (something else?), which are defined elsewhere (make a list of dictionaries recognised as important), so that SoP rule would not be applied to such entries. Does someone care to set up rules and a vote? --Anatoli (обсудить/вклад) 02:13, 7 February 2013 (UTC)

Are you suggesting a rule that presence in other dictionaries trump our current attestation rule also, or our current idiomaticity rule only?​—msh210 (talk) 06:04, 7 February 2013 (UTC)
Only idiomacity if there is a dispute. I'm not suggesting here to include any unattestable words. --Anatoli (обсудить/вклад) 11:26, 7 February 2013 (UTC)
I don't think these terms would pass anyway. They aren't the usual terms in English; the usual terms—the fixed phrases—are mass noun and count noun. "Countable" and "uncountable" are more often used as predicates, as in "the noun dog is countable; the noun milk is uncountable". Using them as attributive adjectives just isn't idiomatic. —Angr 15:23, 7 February 2013 (UTC)
I can't see how to do this. Also I don't see any advantages. At some point I think you have to give reader some credit. If they know what a lung and what cancer is, they know what lung cancer is. Mglovesfun (talk) 23:14, 7 February 2013 (UTC)
But is there a technical definition of lung cancer that is accepted by most major medical dictionaries? One that involves more detailed criteria than just "cancer of the lungs?" It's kind of like outer core. At face value, this would seem to mean simply "the outer part of the Earth's core," but it has a specific, narrow meaning in geoscience, which one couldn't gather from simply looking up outer and core. I think that "all words in all languages" should include technical terminology/jargon, and that pedantic adherence to the letter of the SoP rule is currently limiting the inclusion of such terms, and ultimately doing Wiktionary a disservice. I'm in agreement with Anatoli's proposal: we could compile a list of medical dictionaries, etc., considered authoritative, and use them as the basis for determining which "jargon" terms warrant inclusion. Astral (talk) 00:18, 11 February 2013 (UTC)
I am reasonably confident that "lung cancer" is applied to a variety of different cancers have little in common aside from being cancers and being in the lungs. —RuakhTALK 01:40, 11 February 2013 (UTC)
The comment above wasn't intended to apply to lung cancer specifically, but to raise a general point. Astral (talk) 02:41, 11 February 2013 (UTC)

I agree that we should trust the most competent people about each technical terminology. If a term is included in a specialized dictionary, this means that it was felt by this specialist as a technical term worth inclusion. If the term cannot be found in any specialized dictionary, then we must reflect about it. Lmaltier (talk) 06:35, 11 February 2013 (UTC)


The label count noun and mass noun would be much less confusing for readers than the opaque and seemingly contradictory countable and uncountableMichael Z. 2013-02-10 16:57 z

I've seldom seen "mass noun", and practically never seen "count noun". I think of them as new-age American terms. On the other hand, "(un)countable" is very familiar, and I don't think the meaning is that opaque. This, that and the other (talk) 09:58, 11 February 2013 (UTC)
"New age" because they've only been in use for 75 years or so? "Mass noun" and "count noun" are the standard terms, each getting orders of magnitude more hits on Google Books than "uncountable noun" and "countable noun" respectively. —Angr 14:44, 11 February 2013 (UTC)
When an unfamiliar reader sees “count noun and mass noun,” she can tell that these are two qualities or roles of this noun, and perhaps infer their precise meaning. Whan an unfamiliar reader sees “countable and uncountable,” he can only say WTF? That was my experience. Michael Z. 2013-02-11 16:58 z
Are we talking about what to display in the {{context}}s of various nouns? I find {{countable}} and {{uncountable}} clear, and I find {{context|countable|and|uncountable}} a lot clearer than I would find {{context|mass noun|and|count noun}}. Like User:This, that and the other, I find "count noun" strange (even though I see it a lot), and even though count noun (noun) is markedly more common than countable noun (noun), I personally prefer for the {{context}} label to be countable (adjective) rather than either noun. - -sche (discuss) 21:33, 11 February 2013 (UTC)


All the entries needing treatment in User:Yair rand/uncategorized language sections/Not English#Hmong are Han characters, but Category:Hmong language says it uses Latin and Pahawh Hmong scripts (no mention of Han). Mglovesfun (talk) 13:32, 8 February 2013 (UTC)

Wikipedia doesn't mention Han either. No idea where the entries came from. -- Liliana 13:58, 8 February 2013 (UTC)
Wikipedia: "Since the end of the 19th century, linguists created over two dozen Hmong writing systems, including systems using Chinese, Lao, Russian, Thai, and Vietnamese characters and alphabets." Mglovesfun (talk) 14:07, 8 February 2013 (UTC)

Replacing e.g. {{l|ca|…}} with {{l/ca|…}} whenever possible.[edit]

Do we want to replace, for example, {{l|ca|…}} with {{l/ca|…}} whenever possible (i.e., whenever the language-specific subtemplate exists, and no gloss= or whatnot is specified)? That is, do we want to reserve {{l}} for the special cases where either no language-specific subtemplate exists, or else, we want to add additional information not supported by the language-specific subtemplates?

If so, then — do we still want to use the names l/ca and so on?

RuakhTALK 05:10, 11 February 2013 (UTC)

I do (and every other language). It’s past the time we should be paying attention to template efficiency. — Ungoliant (Falai) 05:17, 11 February 2013 (UTC)
Should the language-specific templates be used widely? Sure. Should they be called {{l/ca}}? Eh, I wouldn't mind renaming them {{l-ca}}, etc, but I don't mind their current name, either. - -sche (discuss) 07:41, 11 February 2013 (UTC)
What uncertainties remain about the net benefit of such a replacement? If there are significant ones, then we should be rolling out the conversion in a limited fashion. If not, then we should be ready for rollout to, say, all languages not in Latin script.
Should the names conform to the convention of language code first, ie, {{ca-l}}? DCDuring TALK 10:59, 11 February 2013 (UTC)
I think we should use these new templates whenever possible, because every bit helps and there is no advantage to using {{l}} in these cases. I don't think there are many uncertainties about the benefits, we've already had some direct experiences with the improvements, which are quite substantial. The uncertainties are mostly about the maintainability of such a large amount of near-identical templates, but that also applies to other templates we have one per language of (language code templates). We can apply the same principles here too: require them all to be in a standard format, and document the templates that differ (for example, Gothic currently links to its transliterations, while Serbo-Croatian does not support transliterations). —CodeCat 14:35, 11 February 2013 (UTC)
I support this, although it should be done in a less haphazard way. Bot-replacement of l's that obviously will suffer no loss or have editors who can check which features are needed should happen soon. —Μετάknowledgediscuss/deeds 19:39, 11 February 2013 (UTC)
I was already running such a bot, but Ruakh asked me to wait until it could be clarified. —CodeCat 19:44, 11 February 2013 (UTC)
I meant that {{l/ca}} exists but {{l/it}} does not because you care about Catalan more than Italian. It should be rolled out for all languages, ideally. —Μετάknowledgediscuss/deeds 19:47, 11 February 2013 (UTC)
Actually I was trying to work through the long list of replacements that already need to be done. Italian is a very "big" language on Wiktionary and has many entries, so I wanted to postpone it until the backlog had been cleared first. There's already about 50 thousand entries in the list... —CodeCat 19:49, 11 February 2013 (UTC)
@CodeCat: What is the empirically determined performance benefit of {{l/ca}} over {{l|ca}}? --Dan Polansky (talk) 20:22, 11 February 2013 (UTC)

For better context, here is the code of {{l/ca}}: <span lang="ca">[[{{{1}}}#Catalan|{{{2|{{{1}}}}}}]]</span>. It uses no templates and is very simple. By contrast, here is the code of {{l}}:

-->{{#if:{{{tr|}}}| (<span lang="">{{{tr}}}</span>)}}<!--
-->{{#if:{{{g|{{{g1|}}}}}}| {{{{{g|{{{g1|}}}}}}|{{{g2|}}}|{{{g3|}}}}}}}<!--
-->{{#if:{{{gloss|}}}| (<span class='mention-gloss-double-quote'>“</span><span class='mention-gloss'>{{{gloss}}}</span><span class='mention-gloss-double-quote'>”</span>)}}<!--
-->{{#if:{{NAMESPACE}}{{{sc|}}}{{{tr|}}}{{{g|}}}{{{g1|}}}{{{g2|}}}{{{g3|}}}{{{gloss|}}}||{{#switch:{{{1}}}|be|bg|br|ca|cs|csb|cy|de|dsb|en|es|fr|frm|fro|ga|gd|got|grc|gv|hsb|kw|lv|mk|my|nb|nl|nn|no|oc|pl|pt|ru|rue|sga|sk|sl|sv|tr|uk=[[Category:l eligible for subtemplate]]}}}}<!--

--Dan Polansky (talk) 20:31, 11 February 2013 (UTC)

WT:Grease pit/2013/February#Can't save large page, and also the Category:Proto-Slavic appendices, all of which are now significantly faster just by replacing the templates. —CodeCat 20:36, 11 February 2013 (UTC)
Can you quantify "significantly faster"? Have you measured how much faster they are? --Dan Polansky (talk) 20:39, 11 February 2013 (UTC)
Does it matter? —CodeCat 20:50, 11 February 2013 (UTC)
So you have started replacing things without having first measured the achieved performance benefit, right? --Dan Polansky (talk) 21:53, 11 February 2013 (UTC)
That's one full use of Xyzy (which is a large scary template with its own subtemplates, multiple uses of urlencode, and a whole bunch of parser functions), a language template, three #ifs and 8 parameters. This could be simplified. If Xyzy was simplified as I proposed at Template talk:Xyzy#Proposal: Remove script template call, the call to it could be removed entirely from l. The three #ifs could also be merged. If we did all this, the resulting code could be:
--><span class="{{{sc|{{Template:{{{1}}}/script}}}}}" lang="{{{1}}}">[[{{{2}}}#{{{{{1}}}}}|{{{3|{{{2}}}}}}]]</span><!--
  -->{{#if:{{{tr|}}}| (<span lang="">{{{tr}}}</span>)}}<!--
  -->{{#if:{{{g|{{{g1|}}}}}}| {{{{{g|{{{g1|}}}}}}|{{{g2|}}}|{{{g3|}}}}}}}<!--
  -->{{#if:{{{gloss|}}}| (<span class='mention-gloss-double-quote'>“</span><span class='mention-gloss'>{{{gloss}}}</span><span class='mention-gloss-double-quote'>”</span>)}}<!--
...taking it down to a script name template, a language template, one #if, and 8 parameters. l/ca and such would require only one template call per language link, no #ifs, and 2 parameters. And a whole bunch of editor-side confusion about when to use what, and how to maintain these. That said, I have no idea how much speed difference we're talking about here, so it's really hard to tell whether this matters or not, or which version would even be fastest. Does anyone have any idea how to measure the speed of these things? --Yair rand (talk) 21:05, 11 February 2013 (UTC)
I just want to make sure, nobody is advocating deprecating {{l}} are they? I think not. A language-specific template may help on large pages that use {{l}} a lot, but the difference in small pages will be microscopic (see Yair rand's comment directly above this one). Mglovesfun (talk) 13:29, 13 February 2013 (UTC)
I think it would be similar to the difference between the generic-but-complex-and-slow {{head}} and the leaner and more specific {{en-noun}}, {{nl-verb}} and so on. We use the specific templates when we can, and fall back to the generic one when we need to. —CodeCat 15:00, 13 February 2013 (UTC)
If that's the point of the language-specific headword templates, someone may want to edit {{ga-noun}}, {{ga-verb}}, and probably many others, because they call {{head}} directly. —Angr 19:57, 14 February 2013 (UTC)
That is certainly not the point of those templates, and honestly, I'm not convinced it's even a point. If a template is only used once or at most twice on a typical transcluding page, then there's really no point trying to optimize it. (See w:Amdahl's Law.) That said, when I write a headword-line template, I never use {{head}}. {{head}} is wonderful, but it was designed to be very convenient when called directly from entries, and not necessarily to be convenient as a metatemplate for other headword-line templates, especially ones that have any degree of complexity or flexibility. (For example, I don't think {{head}} has any mechanism to use semicolons instead of commas, if you want to group your inflected forms into groups. And it's hard to fit optional inflected forms into the numbered-parameter framework.) But this is a matter of personal preference: if you're perfectly happy using {{head}} as the backbone of templates like {{ga-noun}}, then more power to you. —RuakhTALK 04:11, 15 February 2013 (UTC)
I used {{head}} in {{ga-noun}} and {{ga-verb}} because I'm not well enough versed in template writing, or any kind of coding, to do something more complicated. There's a reason I asked someone else (which turned out to be you) to write {{dsb-noun}}, namely I couldn't do it myself and I couldn't figure out how to get {{head}} to do it right. But anyway, while {{head}} may be used (directly or indirectly) only once or twice on most pages, on pages like [[a]] it is used a whole lot, and would be used even more if not for language-specific headword templates that don't call it directly. —Angr 06:33, 15 February 2013 (UTC)
Re: {{head}} on [[a]]: That's true, but w:Amdahl's Law still applies. [[a]] is so huge, and calls so many expensive templates so many times, that elimination of {{head}} still won't really help. (Also, this is neither here nor there, but — at this point, if all these expensive calls to {{head}} are offending our sensibilities, then the best approach is to wait till Monday and Luacize it, and then the calls won't be expensive anymore.) —RuakhTALK 06:59, 15 February 2013 (UTC)
With the deployment of Scribunto coming, this measure may no longer be necessary. However, there is still something to look into. Scribunto is faster for large and complex templates, but is it also faster for many invocations of small and simple templates? If {{l/en}} is still faster than a Lua-ised version of {{l}}, then there would still be a good reason to use the former. But we'd have to know this for sure... who do we ask? —CodeCat 01:21, 14 February 2013 (UTC)
  • Arrowred.png Since Scribunto is not Unicode-safe as-is (c.f. bug 39646), and since an awful lot of the content here on Wiktionary depends on Unicode, we might want to 1) really put the brakes on implementing Scribunto site-wide until the module coders get strings figured out (complicated, sure, but also an embarrassing and crippling shortcoming); and 2) make sure that anyone working to implement a Scribunto-based processing flow is fully aware of this major flaw. -- Eiríkr Útlendi │ Tala við mig 21:45, 14 February 2013 (UTC)
  • I completely disagree. Scribunto makes module-writers do some of the heavy lifting when it comes to Unicode-safety, but it's really not a huge deal. (I mean, I agree with you that it's an embarrassing shortcoming, but I don't think it's at all crippling.) —RuakhTALK 04:11, 15 February 2013 (UTC)
Ruakh, you wrote a little bit about Unicode support on WT:LUA. Do you think you could rewrite some of it so that it's less of a technical description of the problem, and more practical to editors here? Something like a list of "situations/operations when to be careful" would probably be very helpful. The idea being that if editors allow themselves to be trained when to be cautious (like, say, if they write [ ] after a string) then they'd make less mistakes because they are more aware of the problem. —CodeCat 14:26, 15 February 2013 (UTC)

the format of constructed languages' codes[edit]

Per WT:CFI, the following constructed languages are included in the main namespace: "Esperanto, Ido, Interlingua, Interlingue (Occidental), Lojban, Novial, Volapük". Accordingly, these languages' codes reside in the template namespace with no prefix, like other languages' codes.

Among the other (i.e. non-mainspace) constructed languages which have ISO codes, many have — as I believe(d) to be standard — codes prefixed with conl:. However, four have codes with no prefix: Template:afh, Template:igs, Template:avk, Template:rmv. Four are not given codes here at all, that I can find: the ISO's zbl, bzt, dws and neu. (There's also {{lfn}}, but it's a redirect...but it's used in Category:lfn:All topics and elsewhere. I'm not sure what should be done about it.)

I would like to standardise the situation by moving {{afh}}, {{igs}}, {{avk}} and {{rmv}} to {{conl:afh}}, etc, so that all non-mainspace conlangs which have codes are prefixed with conl:. (If zbl, bzt, dws and neu are ever created, I suggest they be prefixed with conl:, too.) Are there objections to this; would anyone like to defend the current non-uniform treatment/encoding of non-mainspace constructed languages, or propose a different way of standardising things? - -sche (discuss) 05:22, 11 February 2013 (UTC)

We already have some that way, so we might as well be consistent. The main concern would be the effect on existing templates, entries, etc, and on the people who are working with the language. How much of an investment do we have in the status quo? Chuck Entz (talk) 05:38, 11 February 2013 (UTC)
Sounds reasonable. — Ungoliant (Falai) 05:29, 11 February 2013 (UTC)
I agree. —CodeCat 14:36, 11 February 2013 (UTC)
Support unless it breaks anything. —Μετάknowledgediscuss/deeds 18:16, 11 February 2013 (UTC)
Done. I deleted the redirect {{lfn}}, and moved {{afh}}, {{igs}}, {{avk}} and {{rmv}}. (I did not move any of the lfn:-prefixed categories. I still don't know what to do with them; I suppose they're OK as-is.) The only pages affected are lists of codes, which bots will soon update, or which are gathering dust in userspace. - -sche (discuss) 21:59, 11 February 2013 (UTC)
The LFN topical categories are not OK. Appendix means "one per page language", not "one page per term". -- Liliana 22:14, 11 February 2013 (UTC)
@Liliana: What? I really have no idea what you mean; if I'm interpreting you correctly, what you're saying contradicts longstanding practice here. But I can't be sure, because your statement does not follow traditional English grammar as far as I can tell. —Μετάknowledgediscuss/deeds 22:51, 11 February 2013 (UTC)
Aaaa, what happened to my sentence? Basically, appendices on constructed languages should look like Appendix:Quenya, where all terms are collected on one page together with definitions. What we have in the case of LFN is that every term has its own entry, and is categorized into part of speech and topical categories just like normal entries are. That totally misses the point of them being appendix-only languages, because if they are formatted like mainspace entries, they may just as well be in the mainspace entirely. -- Liliana 23:07, 11 February 2013 (UTC)
To clarify: when I said I wasn't sure what to do with the lfn:-prefixed categories, I meant I wasn't sure if I should leave them as-is, move them to Category:conl:lfn:Countries, etc, or delete them. My understanding of the rules for their contents matches Liliana's: we don't want non-mainspace conlangs to have more than one appendix page. Of course, conlangs aren't the only class of thing formally restricted (sometimes by WT:VOTE) to one appendix page which nevertheless de facto ignore that restriction. - -sche (discuss) 04:54, 12 February 2013 (UTC)
I fundamentally disagree with that. The Quenya "solution" makes listing etymologies, related terms, etc considerably more sloppy or impossible. Conlangs should have the same rights as other languages as long as they keep to the appendices when required to by policy. —Μετάknowledgediscuss/deeds 05:49, 12 February 2013 (UTC)

WT:CFI and constructed languages[edit]

WT:CFI's section on constructed languages is out of date. It says: "At present another 12 of the 7000 languages in the ISO 639-3 list are constructed languages." However, the ISO has added several more constructed and natural languages since that was written. I thought of rewriting it "There are a few other constructed languages in the ISO 639-3 list", but that would require the entire paragraph and the two following paragraphs to be restructured. It's probably better to update WT:CFI to note how we handle the new languages, anyway (even if "there is currently no consensus regarding..."), rather than merely patch the old wording. So, how can the section be reworked? And would anyone like to argue for the inclusion in the main namespace of any conlang Wiktionary currently excludes, or for the exclusion of any currently-included conlang? - -sche (discuss) 05:22, 11 February 2013 (UTC)

codes the ISO added that we missed[edit]

DTLHS kindly imported the codes the ISO most recently added, but I notice we're missing quite a few other codes the ISO has added over the years. In some cases, that's by consensus (e.g. it was decided to treat {{hbo}} as {{he}}), but in many other cases, it seems we all just failed to notice that the ISO had added things (e.g. {{yga}}). And in some cases, it seems sets of codes were not imported simply because the members of the set had the same name, disambiguated parenthetically (e.g. etn "Eton (Vanuatu) " vs eto "Eton (Cameroon) "). We can and do distinguish lects parenthetically, though, so parentheses should be no impediment to our importation of those codes. (And we can rename all the ones we find attested alternative names for.)

I've compiled a list of three-letter ISO codes (A–M) that we're missing, here. (I'll list N–Z later, or some other industrious person can.) I've struck ones that I knew Wiktionary had intentionally excluded. Please look over the list and strike any other codes you see on it that shouldn't be (re)added, and feel free to add comments, if you think "this should be called...", "this is just a dialect of...", etc. In a week or so, I think we should import these codes that have hitherto been missed. - -sche (discuss) 06:20, 11 February 2013 (UTC)

... and put the remaining list somewhere more visible, so the consensus on excluding certain ISO codes is documented. This, that and the other (talk) 09:54, 11 February 2013 (UTC)
Great idea. Wiktionary:Languages (WT:LANGCODES) seems like a good place to note specifically-excluded codes. - -sche (discuss) 21:49, 11 February 2013 (UTC)
The list of three-letter ISO codes (N–Z) that we're missing is here. - -sche (discuss) 22:27, 12 February 2013 (UTC)

Quotes in WT:CFI[edit]

Can someone please revert this edit to CFI? I believe it is not based on consensus. The edit introduces typographic quotes. "Any substantial or contested changes require a VOTE." A change like this has been contested, as evidenced in Wiktionary:Votes/pl-2008-12/curly quotes in WT:ELE, so the edit should not have been made. The fact that the vote pertains to WT:ELE rather WT:CFI is immaterial. By the way, I still think that any edit to WT:CFI should require a vote, but I am in a tiny minority, as per Wiktionary:Votes/pl-2012-03/Vote requirements for policy changes. --Dan Polansky (talk) 18:55, 12 February 2013 (UTC)

I wanted to revert it as soon as I saw it but was afraid of starting an argument with Krun. Mglovesfun (talk) 21:19, 12 February 2013 (UTC)
The change was not substantial, but it is being contested. You asked for it: a VOTE. Perhaps this can set a precedent to avoid 1,000 other such VOTES in the future: Wiktionary:Votes/pl-2013-02/Disallow typographic punctuation in policiesMichael Z. 2013-02-15 16:27 z
Done. (This doesn't mean I'm voting for such changes' being forbidden or requiring a vote: it means only that right now they do, so should not be made without one,so I reverted the incorrect edit.) Striking this section.​—msh210 (talk) 19:15, 15 February 2013 (UTC)
FYI, I have created another vote: Wiktionary:Votes/2013-02/Typographic vs ASCII punctuation in policies. --Dan Polansky (talk) 23:17, 15 February 2013 (UTC)

Old Cyrillic script in headwords[edit]

The Old Cyrillic script (Cyrs) is used for Old Church Slavonic and Old East Slavic entries. When it is bolded in headwords, it is almost unreadable, on my machine. Historically, boldface and italic did not exist for this script. For emphasis in different situations, manuscript and printing would use an ornamental-capitals font, all capitals, or letter spacing.

Current headword display. See Appendix:Old Cyrillic script for suggested fonts.

дѣвка (děvka)

каплꙗ (kaplja) f

{{Cyrs}} already sizes this script by 125%. I’d like to display the headword with this CSS:

font-weight: normal; 
letter-spacing: 0.1em; 
font-size: 120%; /*  125% * 120% = 150% of base font-size */

To render thus, on your machine.

дѣвка (děvka)

каплꙗ (kaplja) f

Comments or objections? Michael Z. 2013-02-12 19:53 z

On my computer, that looks much, much better than the status quo. —Angr 20:00, 12 February 2013 (UTC)
I just checked – our suggested OCS fonts have no bold weights, so our browsers have been artificially fattening the letters. Definitely needs to be changed.
I’m still experimenting. Maybe the font-size shouldn’t be changed, in which case these headwords will stand out from all others. Here’s a bit more letter-spacing instead:

дѣвка (děvka)

каплꙗ (kaplja) f

 Michael Z. 2013-02-12 20:51 z
The bolded letters are only marginally less readable than normal, but the wide spacing looks very ugly. I think the status quo is fine, at least for me. —CodeCat 21:29, 12 February 2013 (UTC)
I prefer the second version above to the wide spacing. —Angr 21:46, 12 February 2013 (UTC)
I think I would prefer the original size, but maybe without bolding. The second version also looks odd to me. I see Old Cyrillic and normal Cyrillic in the same font, except the former is normally bigger. Which is kind of odd. —CodeCat 21:51, 12 February 2013 (UTC)
Sounds like your font might not cover some of the newer Unicode characters. If and ꙗ look the same to you, then you need to install a newer font. I also just found an experimental font that has support for some of the newest Unicode.[3]
I see the format shouldn’t look to far out, for compatibility with regular Cyrillic fonts on systems without old Slavonic fonts. I will give it another try when I have time. Michael Z. 2013-02-12 22:18 z
They look the same except one is bigger. Why would I need to install a newer font if it already looks good? I actually want them to look the same, it's more consistent. The name of the font is DejaVu Sans. —CodeCat 22:20, 12 February 2013 (UTC)
The yat (ѣ) has an ascender, if that’s what you mean. The w:pre-Petrine Cyrillic alphabet has its own graphical conventions, and has no roman, italic, bold, serif, or sans-serif fonts. You are seeing it in a generic sans-serif typeface. Raises the question of how we should be trying to present it to readers, especially now that web fonts might be an option. Michael Z. 2013-02-13 01:02 z
This is what I see: File:cyrillic fonts demonstration.pngCodeCat 01:19, 13 February 2013 (UTC)
That’s as I expected. The letters have the correct form for modern Cyrillic sans-serif. For comparison, here’s what I see, with Hirmos Ponomar font installed, and a line in my CSS that puts it at the front of the font list for Cyrs: File:Cyrillic fonts demonstration, Hirmos Ponomar.png. The article w:Early Cyrillic alphabet has images of typical individual letters, and scans of manuscripts.
In use, Old and modern Church Slavonic and Old East Slavic always use Slavonic fonts and writing conventions. In historical or linguistic analysis, it is sometimes presented in modern type, but then it’s usually also simplified, rewritten, transcribed into a modern language, or romanized. I think our convention is to present words in their native form, so it is appropriate to prefer a native font when possible. Michael Z. 2013-02-13 16:43 z
We certainly don't usually present words in old languages in native form. Latin, Old English and Old Norse are all normalized to some degree. I think doing OCS in a native font is a bit silly... it would be like the convention (which existed 100 years ago) to write German in blackletter font. Are we supposed to start doing that for older German on Wiktionary too? Or start putting Irish in uncial script? I really can't support this. —CodeCat 16:54, 13 February 2013 (UTC)
This isn’t a dead script. This is the way Church Slavonic is written today. Writing CS in sans-serifs and bolding it in headwords is alien to the language. There’s also the fact that most or all faces supporting the full set of OCS characters lack a bold font altogether. Michael Z. 2013-02-13 18:58 z
Surveying reference works, it appears that Church Slavonic is typically reproduced with a native font (as is Greek):
  • August Leskien 1886, Der Altbulgarischen (Altkirchenslavischen) sprache.[4]
  • Preobrazhensky 1910–14, 1951 Etymological Dictionary of the Russian Language.[5]
  • Max Vassmer (Maks Fasmer) 1964–73, Etimologicheskii slovar' russkogo iazyka.[6]
  • N.M. Shanskii 1963–99, Etimologicheskii slovar' russkogo iazyka.[7]
  • George L. Campbell 1997, Routledge Handbook of Scripts and Alphabets, pp 84–85.[8]
  • Horace G. Lunt 2001, Old Church Slavonic Grammar. See pp 1, 4 in the “look inside.”[9]
 Michael Z. 2013-02-13 20:49 z
Yes, though I have an older edition of Lunt's Grammar that uses a modern Cyrillic font. —Angr 21:00, 13 February 2013 (UTC)
I still think it's silly and pedantic though... —CodeCat 21:08, 13 February 2013 (UTC)
It’s not. Fonts for modern Cyrillic, like your DejaVu, have letter shapes and capital/small distinctions that don’t exist in Old Slavonic writing. Some are subtle, but some are significant.
  • а А/а А and ꙗ Ꙗ/ꙗ Ꙗ don’t have a two-storey form with an overhanging arm
  • б Б/б Б doesn’t have a distinctly-shaped lowercase with an ascender
  • е Е/е Е doesn’t have a closed bowl
  • є Є/є Є) has a lowercase form that is significantly larger than the е
  • і I/і I doesn’t distinguish the lowercase letter by a dot
  • и И/и И and н Н/н Н have subtly different angles on the crossbars
  • щ Щ/щ Щ has a centred descender, not a small tail on the right
The modern Cyrillic w:civil script has innovations and Latin type influences that don’t occur in Old Slavonic writing. A modern Cyrillic е glyph with a closed bowl isn’t a Church Slavonic е, any more than a Greek eta ε is an English e. Cyrl and Cyrs are not just style differences, but two different language scripts, used mainly for different languages. They have different orthographies, different letter-case distinctions, and different letter shapes.
We don’t have to choose fonts that replicate medieval manuscript. But text in old Slavic languages should try to represent the typical characteristics of Slavonic script, not those of 21st-century Latin script. Michael Z. 2013-02-13 23:53 z


User:Jakemartin206 added cartoon illustrations to two entries (abet and adverse possession). I removed them on the grounds that it just doesn't seem like something we do (and, for the latter cartoon, because it seemed to me to miss some elements of the definition). This is not something we do, right? bd2412 T 21:29, 15 February 2013 (UTC)

The cartoons show hypothetical cases. The closest feature we have is probably the Examples box (see e.g. split infinitive), though that isn't based on pictures. Since the cartoons require reading caption text anyhow, they are not useful in the same way as normal pictures (to illustrate the thing, like apple). Equinox 21:34, 15 February 2013 (UTC)
I second that last sentence. There's nothing in either cartoon not as well depicted in a usex.​—msh210 (talk) 21:51, 15 February 2013 (UTC)
I think these are the most charming thing I’ve ever seen on Wiktionary. By making the entry pleasurable to read, they are helpful in the same way that very good writing is. I know I’m late to the game, but I vote to restore these. Michael Z. 2013-04-12 02:20 z
If we have cartoons, next thing you know we'll have users. And that's where the trouble begins. DCDuring TALK 03:23, 12 April 2013 (UTC)
I don't object to illustrations in definitions, but the cartoons in question were not illustrative of the concepts for which they were being provided. For example, merely using a piece of land for fifty years does not make this use adverse possession, since one can not adversely possess one's own property; the land must be legally owned by another person to be adversely possessed. As for the other cartoon, it appears to only illustrate the obsolete meaning of the word "abet" (to encourage), and does not illustrate the modern meaning which requires providing actual aid in furtherance of an action. The cartoon provided is therefore not a "visual definition" of anything, since it offers no clarification even to that first sense, beyond what already appears in the definition line. Aside from possibly using a cartoon depicting a visual pun to define "pun" or something like that, I don't see how cartoons with text boxes can add any value to the entry. bd2412 T 15:14, 13 April 2013 (UTC)
It’s not our way to toss stuff in the bin merely because it needs improvement, or because it has an inaccurate heading over it. It might be easier to handle cartoons if the text were in captions instead of embedded in the graphics.
Usage examples reinforced by illustrations that embody narrative could be a very powerful tool for conveying definitions. Michael Z. 2013-04-14 01:48 z
Geneva mechanism 6spoke animation.gif
I can think of no way to illustrate a Geneva mechanism without an illustration like this one. However, the first two senses of "cartoon" in our own entry indicate humor or satire, properties which are more likely to confuse or merely detract from an entry than add anything useful. bd2412 T 02:42, 14 April 2013 (UTC)
Surely a satirical cartoon is a type of cartoon, not a separate definition (IMO). Mglovesfun (talk) 15:18, 14 April 2013 (UTC)
The etymological definition of cartoon, still in use, is an artist’s preliminary sketch. Many dictionaries prefer drawings to photographs for illustration, because of their ability to distill the relevant elements of a definition and omit distracting, over-specific detail.
Compare the advantages of a drawing, or “cartoon,” over a photo. This could be applied as well to concepts like abet or adverse possession, as to a concrete noun like escapementMichael Z. 2013-04-14 16:12 z
If we are referring to drawings or sketches being used to illustrate the physical appearance of an object, then I have no objection. However, I would object to a comic strip-style cartoon showing a character with a word balloon or a caption merely rephrasing the concept of the defined term, or describing an action that falls within this definition. This is redundant and potentially confusing. For example, the cartoon that was provided for "abet" could be presented with text alone as an example sentence under the appropriate definition. I would note again that in this case, no indication was given as to which sense the cartoon was intended to illustrate, and the sense that seemed most like what was depicted was obsolete.
On a different note, why is there a picture of a chimpanzee in the "picture dictionary" section of fish? Is this intended as a joke? bd2412 T 14:26, 15 April 2013 (UTC)

About Hmong[edit]

As you may know, Hmong is a large, complex cluster of widely divergent lects which we have had 29+ separate codes for, of which we have deprecated one. (We furthermore never added the ISO's 30th code, {{blu}}, to begin with.) If you have input on how we should handle Hmong, it is solicited on Wiktionary talk:About Hmong. - -sche (discuss) 22:02, 15 February 2013 (UTC)

Interwiki on 404's?[edit]

Hello, why doesn’t wiktionary not automatically display interwikis for 404's? For any word which is not in a given wiktionary, there might be other wiktionaries which do have an entry on it. As the interwiki-format seems to be unchangeable, and all data is in databases anyway, it should be not extremely impossible to link to existing interwiki entries for 404-pages? -- Stratoprutser (talk) 15:06, 16 February 2013 (UTC)

What do you mean by "404"? Are you talking about red links? Of course we could put interwiki links on red-linked pages, only then they wouldn't be red-linked anymore, and people wouldn't realize they had no content except the interwiki link. —Angr 15:48, 16 February 2013 (UTC)
I'm not meaning bot-wise but template-wise. So the page remains empty, but for your convenience, WT already provides the interwikis. -- Stratoprutser (talk) 16:02, 16 February 2013 (UTC)
We already do that. Go into a translation table that has a redlink, and if the foreign-language Wiktionary has an entry for it, there will be a little blue interwiki link. —Μετάknowledgediscuss/deeds 16:46, 16 February 2013 (UTC)
A translation table? Looking for one at pannenkoeken? --Stratoprutser (talk) 19:01, 16 February 2013 (UTC)
Feel free to add a translation table to the singular form. SemperBlotto (talk) 19:07, 16 February 2013 (UTC)
That's not an English entry, so it can't have a translation table, AFAIK. Chuck Entz (talk) 19:39, 16 February 2013 (UTC)
It was an English entry when the comment was posted. CodeCat seems to have changed it to a Dutch entry for some reason. —Μετάknowledgediscuss/deeds 19:44, 16 February 2013 (UTC)
I removed the English entry for now because it seemed unlikely that the singular and plural had a different stem. I created a discussion at the Tea Room about it. —CodeCat 19:48, 16 February 2013 (UTC)
Huh? Sorry that's way over my knowledge of WT. Anyway I coined my question first on Dutch WT, and as I don't really know if there is a meta for wiktionaries, I copied it to EN as NL is not alive enough to get some decent feedback. However, this feature suggestion is way more relevant for WTs with fewer entries than this one, example [10]... -- Stratoprutser (talk) 20:42, 16 February 2013 (UTC)
  • Yes, we absolutely should do this: if [[asefasefawe]] does not exist, but [[fr:asefasefawe]] does, then http://en.wiktionary.org/wiki/asefasefawe should mention this. Unfortunately, I'm not sure how to accomplish that. It can't be done server-side, since en.wikt has no information about what page-names exist on other projects, so it would have to be done client-side. But we certainly don't want http://en.wiktionary.org/wiki/asefasefawe to trigger a hundred different HTTP requests (one to each Wiktionary), so we'd have to create a script somewhere on a separate server; but this runs afoul of trust issues, because changes to off-wiki scripts are not indicated on-wiki. Maybe we can use Wikimedia Labs? —RuakhTALK 20:58, 16 February 2013 (UTC)
Thanks, yes thats what I meant. I was thinking just one big (but small) db with all entries of all wt's and then each 404 page makes a query to that db to get the interwikis, if they exist. But its something for developers, yeah. Maybe Wikidata? They're into interlinks anyway. -- Stratoprutser (talk) 21:17, 16 February 2013 (UTC)
Ah, yes, Wikidata may be a better hope than Wikimedia Labs. —RuakhTALK 22:17, 16 February 2013 (UTC)
As far as I can tell, the originator of this discussion recommends that we have entries that have no content other than interwiki links (or am I mistaken). As a test, I have created such an entry at acarofobia. This is a disaster. The entry shows up as a blue link rather than a red one, so nobody will add a definition/translation because it appears that we already have some content. Seriously bad idea. SemperBlotto (talk) 09:41, 17 February 2013 (UTC)
Yeah you're mistaken. A 404 is supposed to remain a 404. This would be a modification which could NOT be implemented by regular users (or bots) but on the template level or however its called, exactly to avoid the issue you just raise. -- Stratoprutser (talk) 10:01, 17 February 2013 (UTC)
You have totally lost me. I have no idea what you are proposing. SemperBlotto (talk) 16:04, 17 February 2013 (UTC)
So where would it be seen and noticed by users? (I've filled in [[acarofobia]] now so it's a proper entry.) —Angr 15:41, 17 February 2013 (UTC)
Maybe... in the translation tables! (Note my comment above that {{t}} already does this.) —Μετάknowledgediscuss/deeds 16:11, 17 February 2013 (UTC)
Yeah, I know that, but that doesn't seem to be what Stratoprutser is talking about. And what if it's an English word that other Wiktionaries have that we don't? Then there won't be a translation table for it. Or if it's a word in another language that doesn't have a simple and obvious English translation, like Geisterfahrer. —Angr 16:46, 17 February 2013 (UTC)
Perhaps this is referring to the screen one gets when one follows a regular full-URL link to a non-existent entry. It says "Wiktionary does not yet have an entry for" the term. For instance: http://en.wiktionary.org/wiki/bleem. It wouldn't seem to make sense for redlinks. As for {{t}}, that only links to the language of the translation, not to entries for that same spelling in other Wiktionaries, e.g. German or French Wiktionary entries for a Latin term, as opposed to just the Latin Wiktionary. Chuck Entz (talk) 17:00, 17 February 2013 (UTC)

Ok I try again from the beginning. Imagine there is a word, and I have no clue which language it is, example: jklasdkjh.
Chances are, that if it is an existing word, it will be in one of the wiktionaries. If I try the entry in a wiktionary which does not have an entry on jklasdkjh, I would get a default 404 (page not found)-error message with in the sidebar "links to this page", but without interwiki-links.
After my proposed modification, the "page not found"-error message remains (when no content has been created), however the error page does show, in the same section as the "links to this page", interwiki links to wiktionaries which do have content for the jklasdkjh entry (note that before I used pannenkoeken and a colleague acarofobia as example, but others have since created those words, defeating their purpose as examples.... very annoying...but funny...).
This modification seems to me to be particularly useful for small wiktionaries and for wiktionaries of languages which are relatively similar.
I hope I made myself clear? -- Stratoprutser (talk) 17:14, 17 February 2013 (UTC)

  • Just a thought: could Wikidata (or the Toolserver, or perhaps even the main www.wiktionary.org) offer a function that would search all Wiktionaries for a given string, the way our search bar searches en.Wikt? Then, instead of or in addition to interwikis, we could configure MediaWiki:Noarticletext so that instead of the line "Try searching Wiktionary: [search bar]" it had two lines, "Try searching this English Wiktionary: [search bar]" and "Try searching all Wiktionaries: [search]". We could optionally fill the latter search bar, or both, with the name of the page the person was on. Once it existed, the search-all-Wiktionaries bar could also optionally be added to our own search results page. How (in)feasible is this? - -sche (discuss) 18:18, 17 February 2013 (UTC)
No, if you manually http://en.wiktionary.org/wiki/jklasdkjh you do not get a 404 error message. I haven't experienced a 404 message using Internet Explorer, Firefox or Chrome. What browser are you using? Mglovesfun (talk) 18:24, 17 February 2013 (UTC)
Maybe its not an 'official' 404 but I mean a non-content page. In fact, I wasn't aware of the existance of MediaWiki:Noarticletext; it seems to me that together with all the dumps, as available at eg. for English it seems to be a relatively small modification of that page with a call to a a db of all WT-titles. So it really only needs all page titles in one database and a script to keep the db up to date... -- Stratoprutser (talk) 18:53, 17 February 2013 (UTC)
It is an official 404. The server responds with a 404 Not Found message. Mglovesfun just doesn't realize this, because the custom 404 message doesn't look like a 404 page, and he's never inspected it with a tool that lets you see what's actually passed to the browser, such as Firefox's Web Console". (More broadly — I think your initial message made perfect sense. I don't understand why everyone seems to have been confused by it. Thanks for remaining patient.) —RuakhTALK 02:04, 18 February 2013 (UTC)
Ah so its a 404 :) Something says me this should be possible using the API, however I cant find the way to query all wiktionaries at once, and a non existing entry gives an empty return. But as soon as content exists, we get something useful: http://en.wiktionary.org/w/api.php?format=xml&action=query&titles=sloth&prop=langlinks. If no interwiki query is possible this'd need a script with polls the 50 or so largest wiktionaries until one content-page is found? -- Stratoprutser (talk) 11:46, 18 February 2013 (UTC)
This has been discussed in the last six months (somewhere) and I'm broadly for it, but implementing it seems to be the hard bit. Mglovesfun (talk) 12:14, 18 February 2013 (UTC)
What about the script at http://pastebin.com/mLAdDpwA, run it with as http://localhost/interwiki.php?query=washandje and be surprised what the frenchies have... -- Stratoprutser (talk) 13:28, 18 February 2013 (UTC)
  • Now that I understand what the proposal is, I wonder how useful it really would be considering how rarely people actually land on our 404 pages. If I click a red link like grzlfu, I don't land on the 404 page, I land on an editable page waiting for me to start the entry. If I type "grzlfu" into the search box I wind up on the search results page. The only way I can even link to a 404 page is by typing the full URL http://en.wiktionary.org/wiki/grzlfu. So how often does someone wind up on those pages anyway? —Angr 18:31, 18 February 2013 (UTC)
    • I can see two possibilities: On occasion I've edited the URL in my browser by hand to go to a different entry- when the system's slow, it saves waiting for the search. The main source, though, would be links from offsite. Setting aside other Wiktionaries, which would presumably already have the information we'd be displaying here, I can see how careless editors at different wikipedias might throw in a Wiktionary link without checking whether there was an actual entry. Not the most common occurrence, perhaps, but judging by how often it happens going the other way, not insignificant, either. Chuck Entz (talk) 19:11, 18 February 2013 (UTC)
      • Yeah, that's true. And typing [[wikt:grzlfu]] at Wikipedia does take you to the 404 page, not to the editable page or the search page. —Angr 19:24, 18 February 2013 (UTC)
        • And besides: if we can put these links on the 404 page, then we can also put them on the redlink-target-page, on the search-results-page, and so on. —RuakhTALK 19:26, 18 February 2013 (UTC)
          • Fair 'nuff. What would be really cool is if the interwiki links automatically appeared in the edit box when you clicked a red link. —Angr 19:39, 18 February 2013 (UTC)

Yahoo Pipe for 404s[edit]

  • Just to keep you posted; I'm trying to build something with w:Yahoo Pipes as it seems it handle many queries simeltaneously, so maybe I can make it reply the words which exist in w:json. Just it takes some time to get all 170 wiktionaries in there... -- Stratoprutser (talk) 08:41, 22 February 2013 (UTC)
    • OK, demo version here (query:water), For now it queries fr, en, lt, tr, zh, ru, vi, io, pl, fi, and returns a json-object with the wiktionaries which have a page on water.... feedback? -- Stratoprutser (talk) 17:43, 23 February 2013 (UTC)
  • I see two problems, and not to be a pessimist, but I suspect that both of them are insurmountable.
    Problem #1: The same-origin policy means that a page on en.wiktionary.org can't retrieve the contents of that URL as a string; the best we can do is run the contents of that URL as JavaScript. So the URL can't just be a JSON object, it would have to actually call a JavaScript function (a callback) with that object as an argument. And the problem is, how do we know that we can trust the URL not to do anything untoward, such as steal users' passwords and send them to some other site?
    Problem #2: Do I understand correctly that that URL causes a separate query to be run against each Wiktionary every time it is called? If so, and we started calling that URL from redlinks, then I don't think it would be very long before the WMF powers-that-be started blocking those requests.
    RuakhTALK 18:27, 23 February 2013 (UTC)
Hi, regarding same origin policy, yeah it'd need some client-side jquery or so to parse the json-object. The advantage is then also that we don't depend on the response-time of Yahoo. I don't see a risk for an XSS as the Pipe-URL would be hard-coded towards Yahoo. As for your 2nd concern; it wont be called for every red link, just for 404-hits, those are probably less. And indeed, 170 polls is a lot, maybe to limit it to the biggest twenty... Right now the pipe isn't querying for interwikis, but I can imagine that it would. And if there would be any alarm bells at WMF they could either route the concern into enabling api-calls to all wiktionaries at once or make the wikidata roll-out for wiktionary include non-existing pages... Also, Don't worry about performance :) -- Stratoprutser (talk) 20:02, 23 February 2013 (UTC)
Re: your first two sentences: Maybe I'm just not understanding you, but — it sounds like you're not understanding me. Do you know what the same-origin policy is?   Re: third sentence: Maybe I just don't know enough about Yahoo Pipes — which would not be surprising, since I don't know anything about Yahoo Pipes — but it would have to be very restrictive in order for XSS to not be a concern. If it's possible for you to control the JS that's output, then we have a problem. (And if it's not, then we have a different problem.)   Re: "it wont be called for every red link, just for 404-hits": then I don't know if it's worth bothering to do this.   Re: last two sentences: WMF has a policy about automated queries. "Don't worry about performance" doesn't apply. —RuakhTALK 20:18, 23 February 2013 (UTC)
Oh, but re: "it wont be called for every red link, just for 404-hits": We may be miscommunicating. When I say "calling that URL from redlinks", I mean "calling that URL from the edit-page that you get to when you click on a red-colored link", not "calling that URL from every page that contains a red-colored link". Did you think I meant the latter? —RuakhTALK 20:26, 23 February 2013 (UTC)
I think you were right I got confused on the SOP, but as Yahoo Pipes is a big thing, a little googling gave me already a few workarounds. Regarding 404's, I'm not sure if there's a template governing red links, like there is MediaWiki:Noarticletext? I thought to apply it only for http://en.wiktionary.org/wiki/direct_url_links. Regarding the servers, maybe it indeed then needs to use interwikilinks of the ten biggest wiktionaries, although I also wonder if querying the api twenty times if certain words exists on different wiktionaries is that expensive a call? -- Stratoprutser (talk) 13:17, 24 February 2013 (UTC)
Voila, proof of concept, only querying [[:en:]] fr: and mg:, the biggest wiktionaries. Pretty cool, I'd say :) -- Stratoprutser (talk) 16:30, 24 February 2013 (UTC)
I didn't doubt that there existed workarounds, I'm just saying that any workaround will necessarily involve running Pipes output as JavaScript code, and therefore that it will require that we trust that output.   See MediaWiki:Newarticletext. (Also, that's not what "template" means!)   Re: mg:: Well, that's the absolute worst and least useful Wiktionary, so I'm loath to use it for anything . . . but if your goal is just to extract interwikis from there, then maybe that's O.K. :-P   —RuakhTALK 17:37, 24 February 2013 (UTC)
I guess there's also a solution for parsing the wiktionary api inside wiktionary, but for now I sticked with Yahoo. I present my userscript User:Stratoprutser/404.js. The layout needs some tweaking, and in the end I didn't need access to the Mediawiki-templates. Yahoo now queries en, fr, nl, de & es (other suggestions?) and I still need to think of something to display language names instead of their abbreviations. Compare [11] and thermostatée. It should be easy to extend it also to the search page. This is just a proposal, feel free to improve the script :). -- Stratoprutser (talk) 23:53, 24 February 2013 (UTC)
  • Note that whatever script loading or sending data from/to non-WMF servers is illegal per the privacy policy. I've asked Hoo_man to comment this specific JS. --Nemo 08:06, 28 February 2013 (UTC)
Are you sure? Hoo man himself wrote hit count.js which is querying non-WMF servers in a similar vein. -- Stratoprutser (talk) 08:51, 28 February 2013 (UTC)
The difference between the tools is that mine only sets a link while yours directly does http requests towards a third party driven service which is unacceptable. In case you (or anyone else) turns it into a gadget or loads it via the common.js I would have to take action like is usually done on privacy policy violations. You can of course keep the user script (please add a notice that it's in disregard with the privacy policy and should only be used at own risk though). Cheers, Hoo man (talk) 19:29, 28 February 2013 (UTC)
Ok i've put a notice. Seeing its possible to execute jquery inside WT made me realize it'd be relatively easy to make parse the api inside WT, but there hasn't been much supportive feedback so Ii doubt anybody is really waiting for it :( -- Stratoprutser (talk) 09:15, 1 March 2013 (UTC)

So User:Stratoprutser/404_native.js now works totally inside wiktionary, but ill post it in the grease pit, as this thread is stalle and in the wrong forum. -- Stratoprutser (talk) 13:26, 11 March 2013 (UTC)

I notice that you've added instructions for adding this to users' personal JavaScript to Help:Customizing your skin, without mentioning any issues with it. I also notice that your only reference in the documentation itself to any of the issues mentioned here is the comment line "See [[Wiktionary:Beer_parlour/2013/February#Yahoo_Pipe_for_404s]]", without any clue offered as to why anyone should follow that link. That doesn't seem appropriate to me, given what Hoo man (talkcontribs) requested above. Chuck Entz (talk) 23:53, 6 April 2013 (UTC)
After taking a second look, it may be that the offending code is no longer in the file, in which case, never mind- but I'd like someone who knows JavaScript to check, anyway. Chuck Entz (talk) 23:59, 6 April 2013 (UTC)
Yeah, like I said before, its all inside WT now. -- Stratoprutser (talk) 08:21, 29 April 2013 (UTC)

Declension vs. conjugation for JA "i" adjectives[edit]

This is a minor nitpick, and is specific to JA entries. I've noticed that the "i" adjective conjugation table at {{ja-i}} use the word "declension" to describe how the adjective changes. I'd always learned that nouns decline, while verbs conjugate. JA "i" adjectives are grammatically verbs (i.e. they can form the predicate of a sentence; they even have tense built in). JA "na" adjectives are admittedly closer to nouns, so I wonder if that's where the confusion arose. However, all the changes that occur for JA "na" adjectives are historically derived from verbal conjugations. I note too that the w:Declension article describes this process as affecting adjectives as well -- but in languages that have grammatical number, case, and gender, of which JA has none.

Would anyone object too strenuously to changing "declension" to "conjugation" at {{ja-i}} and {{ja-na}}, and fixing the relevant affected entries that use this table under a ===Declension=== header? -- Eiríkr Útlendi │ Tala við mig 20:16, 17 February 2013 (UTC)

Why not use "inflection" and avoid it altogether? —CodeCat 21:19, 17 February 2013 (UTC)
  • I'm fine with that.
Would anyone object then to changing all instances of "conjugation" and "declension" in JA entries to "inflection" instead? -- Eiríkr Útlendi │ Tala við mig 21:42, 17 February 2013 (UTC)
I think it's ok. I wouldn't mind changing it for other languages too. —CodeCat 21:58, 17 February 2013 (UTC)
Change the header of this adjective class, and even of all adjectives, sure. Why change the nouns' and verbs' headers, though? - -sche (discuss) 23:24, 17 February 2013 (UTC)
For languages like Latin, where declinationes and coniugationes are part of the way the language is traditionally taught, specific headers are to be preferred. Japanese and Korean adjectives conjugate more than anything else, but 'inflection' is probably less confusing, even to those linguistically oriented. —Μετάknowledgediscuss/deeds 01:28, 18 February 2013 (UTC)
I support using "inflection" for Japanese い-adjectives and all Korean adjectives. This is still called "declension" because adjectives in European languages don't behave like verbs.
Side note 1. With Korean adjectives, I think the best way in translations to display attributive forms and link to predicative. See detrimental - 해로운 (haeroun). "해롭다" is the "verb" or predicative form, 해로운 is an adjective in an attributive form.
Side note 2. With Chinese (Vietnamese, Thai, Lao, etc.) languages there is some confusion in translations of adjectives into English. For examples Vietnamese đẹp (beautiful) people want to translate as "to be beautiful". It's just the way adjectives are in some Asian languages. They are pure adjectives in attributive positions and "to be" + adjective in predicative. These words should be marked as adjectives, IMHO but somehow users should be told that adjectives in these languages also mean "to be" + adjective. --Anatoli (обсудить/вклад) 01:39, 18 February 2013 (UTC)
  • FWIW, when I was studying Chinese in university, the materials we were working from described adjectival words as "stative verbs" instead.
Anatoli (or anyone else), I'm curious if you know whether Chinese, Korean, and Vietnamese allow for verbs and verbal phrases to directly modify nouns. In Japanese, for instance, you can have constructions like 走っている (michi o hashitte iru otoko), literally "the running down the street man", where English would require a relative pronoun to create "the man who is running down the street". This makes it a bit fuzzy about what exactly constitutes an "adjective", as the word is traditionally treated in PIE-derived contexts. -- Eiríkr Útlendi │ Tala við mig 02:12, 18 February 2013 (UTC)
In that, Japanese and Chinese grammars are similar. E.g. (pǎo lùshàng de rén) "the running on the road person" i.e. "the person who is running down the road" uses the same paradigm as Japanese, linking the verb and noun with (de), which is common for multi-syllabic adjectives. The same way other Asian adjectives, which are equivalent of English adjectives may construed as "noun, which is ..." 漂亮姑娘 (piàoliang de gūniang) - "a beautiful girl", 漂亮 - "beautiful" or "to be beautiful" . --Anatoli (обсудить/вклад) 02:35, 18 February 2013 (UTC)
In Bantu languages like Zulu there is a similar situation. Zulu has only a handful of adjectives, anything else that is adjectival is called a "relative" rather than an adjective. Relatives take verbal prefixes rather than adjectival/nominal prefixes, so they behave like verbs in a way. Proto-Indo-European also had such verbs, although they were not as productive, nor were they really necessary to form adjective-like meanings because adjectives were an open and productive word class (in most languages they became perfect/past tenses). Esperanto has both adjectives and stative verbs, but both are productive and one can easily be derived from the other; in practice, the use of stative verbs has actually increased since the language was created. —CodeCat 03:06, 18 February 2013 (UTC)
In Vietnamese, yes, exactly the same as in Japanese. Vietnamese: "tôi yêu phụ nữ" - "the woman I love". In Korean, it's "사랑한 여자". The verb 사랑하다 takes the determiner form. --Anatoli (обсудить/вклад) 02:46, 18 February 2013 (UTC)
Vietnamese is "người con gái/phụ nữ/đàn bà tôi yêu", lit. "person girl/woman I love", same as English ("the girl/woman I love") but different from Japanese/Korean/Chinese, where it is "(I) love (..) girl/woman" (好いた女 / 내가 사랑하는 여자 / 我愛的女孩). "the person who is running down the road" in Chinese is 在路上跑的人 ("at-road-on run DE person"), slightly different from Japanese. Wyang (talk) 03:31, 18 February 2013 (UTC)
Oops, in Vietnamese adjectives are added AFTER the noun, I keep forgetting this, thanks for correcting. Sorry I made a small error in Chinese (跑路上 -> 在路上跑) but the paradigm is still the same (almost, as I said "的" is inserted). The phrase "在路上跑" "to run down the road" + "的" + "人" produce what in English and other European languages would be "(agent), which (does something)". The Vietnamese relative clauses behaves like adjectives as well (phụ nữ tôi yêu - "woman" + "I love"), so it's quite similar to Chinese/Japanese, except for the expected word order difference. In Korean, you have only confirmed what I said (pronouns are usually omitted, as in Japanese). --Anatoli (обсудить/вклад) 03:46, 18 February 2013 (UTC)
The terms predicative form and attributive form Anatoli used are not quite right. They are rather non-relative form and relative form.
Korean Japanese Mandarin Literal translation
예쁜 소녀 美しい少女 漂亮的姑娘 a girl who is beautiful
눈이 예쁜 소녀 目が美しい少女 眼睛漂亮的姑娘 a girl whose eyes are beautiful
In the lower examples, the “adjectives” are predicative rather than attributive because it is the eyes that are beautiful. — TAKASUGI Shinji (talk) 04:14, 22 February 2013 (UTC)
Thank you but the terminology used may be different but I call 예쁘다 (yeppeuda) "predicative" and 예쁜 (yeppeun) "attributive". See also 좋은 (joh-eun) ("good"), which is flagged as "attributive" of 좋다 (jota) ("to be good"). Not my invention :). --Anatoli (обсудить/вклад) 04:40, 22 February 2013 (UTC)

So the conclusion of all this is that Japanese adjectives will be said to have inflections. I assume that it would be nice to have the inflection tables under an Inflection header, so it would be nice to change all the Declension headers. I changed a whole lot of those back when they were half Declension and half Conjugation, and it would be nice not to have to do it again for the 1000 or so entries, so is there anyone who can make a bot to do that? I would do it myself but I don't know the first thing about bots here. Are there any other tools that would be helpful? Should I make a request somewhere, like in the Grease Pit? Is there a Wanted Bots page? --Haplology (talk) 11:30, 15 March 2013 (UTC)

  • @Haplology, if you don't mind waiting a while, learning to build a bot has been on my back burner for some time now, and I'm finally at a point where I can really start getting into that. If you'd like to see movement in the nearer future :), maybe talk to CodeCat or Liliana -- I think Liliana, for instance, runs Kassadbot, which looks after entry headers across (I think) all languages, and she might be able to tweak the bot to replace ====Declension==== with ====Inflection==== relatively easily. -- Eiríkr Útlendi │ Tala við mig 23:31, 18 March 2013 (UTC)

Changing "exceptional" language codes to comply with the HTML specification[edit]

At User talk:Ruakh we discussed the current use of "special" codes like gem-pro, many (but not all!) of which are currently used for proto-languages. They are listed at WT:LANG (but strangely the reconstructed languages are missing). These codes don't actually conform to the HTML (BCP 47) specification. Apparently, if you invent a non-standard code, then everything from the nonstandard part onwards has to be prefixed with -x-. So that would mean that all of our exceptional codes need to be changed (gem-pro to gem-x-pro and so on). Ruakh also thinks that proto-languages should end in -proto rather than just -pro. I don't expect that there will be a lot of opposition to this, but consensus is still good to have. :) Please say whether you agree with the change or not, and if so, whether you also agree with changing -pro to -proto. —CodeCat 05:00, 18 February 2013 (UTC)

Would this change mean we'd have to type {{etyl|gem-x-proto|de}} rather than {{etyl|gem-pro|de}}? Seems like a lot of extra typing for...what? What benefit accrues to human users of Wiktionary? - -sche (discuss) 07:45, 18 February 2013 (UTC)
So if anyone tries to process Wiktionary automatically, they won't have to construct a conversion table between Wiktionary's codes and the real codes?--Prosfilaes (talk) 08:33, 18 February 2013 (UTC)
Or so that browsers' delicate sensibilities aren't offended when they encounter <span lang="gem-pro"> and they don't know how to interpret it and suffer a nervous breakdown from a lack of confidence? —Angr 09:11, 18 February 2013 (UTC)
We've used gem-pro and roa-jer for years and our writers of scripts and bots to process dumps and live entries don't seem to have been impeded by them. Who would write a "Wiktionary processer" in such a way that it couldn't handle roa-jer (presumably triggering "unrecognised code"), but would find roa-x-jer (presumably triggering "made up code for a consequently unknown language") useful? Instruct the processor any "unrecognised code" is a "made up code for a consequently unknown language" and voilà.
I'll argue the pro-x side: if we're making the codes up to begin with, why not make up codes that fit the HTML specs? But we should do that without making more work/typing for human users.
Could we put the x- at the front of the code and have templates recognise that {{etyl|roa-jer}} should call Template:x-roa-jer, just as they currently recognise that {{etyl|gem-pro}} should call Template:proto:gem-pro? It doesn't benefit browsers to know (from roa-x-jer) "this is a romance language but I don't know which one", it can't e.g. screenread it more correctly (unless girafe#Romanian /dʒi'ra.fe/ and girafe#French /ʒi.ʁaf/ sound the same). Or with Scribunto, could we add "has x- prefix" as a property of those languages which do, to likewise pass x-roa-jer on to the HTML while editors keep typing roa-jer?
I see no reason to change -pro to -proto. Actually, for proto-languages, Ruakh's suggestion of proto:gem (as in lang=proto:gem) seems neater than any of gem-x-pro, gem-x-proto, x-gem-pro or x-gem-proto. - -sche (discuss) 19:29, 18 February 2013 (UTC)
We've already discarded proto:gem as an option because that causes more problems than it solves. The main evil we are trying to rid ourselves of is {{langprefix}}. If we adopt proto:gem as the code, we'll end up in a situation that requires a "reverse langprefix" to remove the prefix in Lua before it can be put in HTML, and it would make such codes entirely unusable without Lua support (because templates can't remove parts of a name, only add to it). I certainly oppose such a measure. The easiest and most direct way to use the codes is if the code you type matches the code that ends up in the HTML. Anything else adds {{langprefix}}-like complexity, which is undesirable, and I would oppose that. Concerning the possibility of using "gem-pro" as the code as we do now... well I think I will let Ruakh answer it as he understands it better, but from what I've understood, "gem-pro" is to be interpreted as though the second part means {{pro}}, i.e. Provençal, which is not correct. Such errors might cause the browser to apply fonts that it thinks are applicable for such a language when in reality they are wrong. As for putting the x- in front... that doesn't matter all that much, but I do prefer putting it second rather than first. For some reason, though, Firefox thinks that anything containing "ine" (Indo-European) needs a different font, so "ine-pro", "ine-x-pro" and "x-ine-pro" all get a different font, which isn't right but is probably a bug in Firefox. —CodeCat 20:32, 18 February 2013 (UTC)
Re: "'gem-pro' is to be interpreted as though the second part means {{pro}}, i.e. Provençal": Well, there's no valid way to interpret gem-pro, since pro is not registered as an extended-language subtag. But if it were, then yes, I suppose gem-pro would be equivalent to pro, meaning "Old Provençal". (This may seem bizarre, but it makes sense when you look at real examples. For example, ar-aeb is allowed as a synonym for aeb "Tunisian Arabic"; the latter is recommended and preferred, but the former is allowed for cases where, for whatever reason, it is desired to tag the text as Arabic. I guess you could read ar-aeb as something like "Arabic, or more precisely, Tunisian Arabic". But something like fr-aeb "French, or more precisely, Tunisian Arabic" would not be valid: aeb is only a valid extended-language subtag when the primary language subtag is ar. And similarly for every other extended-language subtag; BCP 47 specifically requires this.) —RuakhTALK 16:00, 19 February 2013 (UTC)
This is neither here nor there, but if we do e.g. rename Template:foo-bar to Template:x-foo-bar, could we keep Template:foo-bar as a redirect which bots would automatically "fix"? If not, why not? - -sche (discuss) 22:30, 18 February 2013 (UTC)
Redirecting codes doesn't actually makes the redirects work like codes, it just sends any requests for the old template to the new one. We've found out in the past that there are certain uses that are not updated with that. Topical categories are probably the most notable example. So the conclusion is that language templates can't be redirected. But with Lua coming, redirects aren't even an option because all the codes will be defined in Lua (presumably we'd want to deprecate the code templates to avoid inconsistencies). —CodeCat 22:36, 18 February 2013 (UTC)
Would the codes nds-de and nds-nl be affected by this proposal? On their own, both nds and de/nl are valid, but the combination nds-de is peculiar to us, and nds-nl is a WMF peculiarity.
What about the code sh? It was a valid code, but has been deprecated (and replaced with hbs).
What about nah? Is it a problem that it's an ISO 639-2/-5 code rather than a 639-3 code? - -sche (discuss) 23:59, 18 February 2013 (UTC)
I think that "nds-DE" is valid, I'm not sure about "nds-de". Ruakh, some help please? —CodeCat 00:17, 19 February 2013 (UTC)
Language tags are technically case-insensitive, so nds-de is synonymous with nds-DE, i.e. "Low Saxon as spoken in Germany". But I think nds-DE is preferable, as it makes its meaning clearer to human editors. —RuakhTALK 16:00, 19 February 2013 (UTC)

As an open and accessible database project we should try to conform to standards as a matter of course. We can’t predict how our information will be used, or how non-conformance will break things like re-purposing our data, or its accessibility by the disabled.

As a website, on the web, we should leave 1996 behind and start using HTML. The HTML5 standard says that a lang attribute’s value “must be a valid BCP 47 language tag, or the empty string”[12]. That must is an absolute requirement for conformance.[13][14]

We can define our private subtags, following -x-, any way we like. But a language variant subtag beginning with a letter must be at least five characters long.[15] So gem-proto or gem-x-proto looks more like a real language tag, making it easier to understand, in addition to the abbreviation itself being clearer than -proMichael Z. 2013-02-21 19:26 z

I've just re-read this whole thread. IMO nobody's addressed -sche's question of what would be the benefit for human users. I'd got a step further and say are there any benefits to this proposal whatsoever? From reading the whole thread, I can't see even one. Mglovesfun (talk) 20:11, 21 February 2013 (UTC)
What about Michael's post just above? I think that's a pretty good reason. —CodeCat 20:19, 21 February 2013 (UTC)
Mglovesfun, as browsers continue to improve support for multilingual text, do you look forward to continuing to dick around with 12 dozen script templates forever? As a “human user,” I do not. Michael Z. 2013-02-22 19:41 z

As a relatively newb user regarding the language codes I prefer having straightforward -proto as a suffix-infix-whatever instead of -pro. When I first encountered -pro, it never crossed my mind what it exactly meant. Only when I comprehended its usage I understood what it stood for. Therefore I think -pro is probably too ambiguous to the newcomers. --biblbroksдискашн 09:56, 23 February 2013 (UTC)

I usually value newb suggestions of ways to make things more straightforward highly, but our language-code naming scheme is complex enough that anyone seeking to create an exceptional code should read WT:LANGCODE first anyway, and it could be made to explain -pro. - -sche (discuss) 21:34, 24 February 2013 (UTC)
To CodeCat, Ruakh and Mzajac, maybe it's just me but I don't see the link between adding an additional -x and browser dealing with multilingual text. How will browsers interpret gem-x-pro differently from gem-pro (or any other code you care to take as a example)? Mglovesfun (talk) 19:05, 24 February 2013 (UTC)
gem-pro triggers "huh? unrecognised code", gem-x-pro triggers "oh, this is a made-up code (consequently, I don't know what language it's for)". Why the second is better than the first, and why browsers can't be instructed to always treat the first as the second, are separate questions. - -sche (discuss) 21:34, 24 February 2013 (UTC)
@Mglovesfun: It'll depend on the browser. IE 14 will treat "gem-pro" as an error that crashes the browser. Firefox 33.0.1 will discard the entire invalid language code, will therefore treat the included text as English (like the surrounding context), and will automatically update the user's English-language spell-check dictionary to add those words. Opera 16, which will have augmented the standard language tag syntax with various sets of private-use codes, will be user-configurable so that it will either understand "gem-pro" as meaning "Proto-Germanic" (using the "English Wiktionary" codeset) or as meaning "Prussogermanian" (using the "sci-fi and alt-hist languages" codeset). Titane 7 will discard the "gem" part in favor of the "pro" part, interpreting it as "Old Provençal".
All of these browsers, of course, will correctly understand "gem-x-proto" as meaning "a Germanic language".
O.K., so, more seriously — why bet against open standards? Why bet against ingenuity, against people's ability to make use of our content if we'll only let them? Why rely on the assumption that the future will be no better than the present? Right now, as we're having this discussion about whether there's any benefit to expanding our standards support (given that browsers' support is incomplete), there is a team of browser developers somewhere in the world that is having a discussion about whether there's any benefit to expanding their standards support (given that Web-sites' support is incomplete).
RuakhTALK 00:53, 25 February 2013 (UTC)

We could also acceptably use x-gem-pro, and treat an entire language code as a private tag, if that is easier to implement.

Tagging non-standard codes would also have the benefit of helping us know when we’re using an ISO code and when we’re not. Transparency and clarity benefit human users. Michael Z. 2013-02-25 01:37 z

It's not easier or harder to implement x-gem-proto instead of gem-x-proto (I think more people so far have said that "proto" is better than "pro"). There is a difference for us as editors of course, and browsers may also interpret them differently: the former is considered entirely opaque and browsers should not attempt to understand it, while the latter is interpreted as a Germanic language of an unknown kind (so the difference is whether a browser reads the "gem" part or not). As for when something is an ISO code... all ISO codes have two or three characters and no hyphens, so any that do will be private use codes that we invented. —CodeCat 01:48, 25 February 2013 (UTC)
A BCP 47 language tag can have almost any combination of the parts language-extlang-script-region-variant-extension-privateuse.[16] The language tag gem-pro looks exactly like a standard primary language subtag followed by an extended language subtag. I believe that pro can never be an extlang subtag because it is already used as a language tag, but its form gives no clue that it is made up. Adding an -x- would make this clear. Michael Z. 2013-02-25 03:04 z
That's true, but currently we don't use any other subtags (except nds-de and nds-nl which are "accidents"). For Wiktionary-internal purposes, any code with a hyphen can safely be considered "exceptional". —CodeCat 03:56, 25 February 2013 (UTC)
I’m having trouble keeping track. Wiktionary:LANG#List_of_languages_with_exceptional_codes says that we are using a number of other “exceptional” subtags. Although nds-DE and nds-NL appear to be perfectly valid language-region codes, and I can’t tell which of the codes on that list are standards-compliant and which are made up. It would be much easier to just remember that we use the standard for all language tags.
And once we tag an HTML element with a lang="" attribute, it is no longer “Wiktionary-internal.” Michael Z. 2013-02-25 04:47 z
That is what I meant though. Out of all the exceptional codes used on Wiktionary, only nds-de and nds-nl are valid HTML lang attributes. All the others are invalid. —CodeCat 14:20, 25 February 2013 (UTC)
I'm still not a fan of the extra typing, but I'll get over it (and anyone who arrives after we make the switch will learn the "x"-way from the start), and I do see the benefit to having the typed code = the code displayed in the HTML = the typed code. However, I repeat the request/idea, made/had by others and by me above, to use "x-roa-jer" rather than "roa-x-jer", because (a) it sets exceptional codes apart from the start (and sorts them all together, etc), (b) I don't see how it benefits browsers to know "this is a romance language but I don't know what kind" and (c) I imagine that, at some point, the people behind the HTML standards—the ones who deprecate codes, naming schemes, etc—may themselves come to the realisation that "this is a romance language but I don't know what kind" is useless and deprecate the "thisisavalidcode-x-ohwaitnoitisnot" format in favour of just "x-thisisamadeupcode". I oppose changing "nds-de" and "nds-nl" to "nds-DE", "nds-NL"—we don't use capital letters in any other codes, and using them for the Low Germans would needlessly split the wiki-code from the typed/HTML code (you'd have to type "nds-NL" to reach "nds-nl.wikipedia.org"...). - -sche (discuss) 19:26, 28 February 2013 (UTC)
Language tags and hostnames are both case-insensitive, so you could type either. For display purposes, it might be advantageous to make our templates display the recommended[17] nds-DEMichael Z. 2013-02-28 20:40 z
If we decide that it's not useful for user agents to understand "roa-jer" to be a Romance language (and so we go for "x-roa-jer") then we have to ask ourselves what use there is for us in retaining the "roa" part. We could also use "x-jer" just by itself. Although, I think someone mentioned that private use subtags have to be 5 characters at least? Could someone confirm that? Because if that's true, then we'd probably need to switch over to "x-jerri" or something similar. And it would also imply that "gem-x-proto" is valid but "x-gem-proto" is not. —CodeCat 21:13, 28 February 2013 (UTC)
Re: "Although, I think someone mentioned that private use subtags have to be 5 characters at least?": You seem to have corrected yourself below, but just to be explicit about it: this is not true. They just have to be between one and eight case-insensitive ASCII letters and digits. —RuakhTALK 05:03, 1 March 2013 (UTC)
I have read -sche's link and found the following:
  • Sequences of private use and extension subtags MUST occur at the end of the sequence of subtags and MUST NOT be interspersed with subtags defined elsewhere in this document. — I interpret that to mean that if "pro" is a valid tag, then "x-gem-pro" is not because it contains "pro" after the "x". Presumably, the easiest way for us to avoid name clashes is to use at least five characters for private use subtags, since 2 or 3 letters may clash with a language code, and 4 letters may clash with a script code. However, it would also confirm that "x-gem-proto" is not valid because "gem" is also a valid ISO-639-5 family code. However, "gem-x-proto" is valid.
  • The subtags in the range 'qaa' through 'qtz' are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 639-2 for private use. These codes MAY be used for non-registered primary language subtags (instead of using private use subtags following 'x-'). — This means we don't have to use the "x-" notation, if we can live with exceptional codes that begin with "q". We already use a few of these for language families, but family codes are not normally placed in a lang= attribute by any of our templates.
  • The script subtags 'Qaaa' through 'Qabx' are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 15924 for private use. These codes MAY be used for non-registered script values. — The same as above, but for script codes. I think it's a good idea to add scripts to the lang= attribute under some circumstances ("zh-Hans" or "sh-Cyrl" would definitely be good usage), so that implies that all of our script codes need to be valid too, in case they end up in a lang= attribute somewhere on Wiktionary. So we would have to forego codes like {{polytonic}}, {{Latinx}} and {{unicode}}. On the other hand, {{fa-Arab}} is already a valid lang= value, but simply concatenating the language with the script code would produce "fa-fa-Arab", which is not correct.
I also noticed that the document puts a strong emphasis on using the available standards as much as possible, and to use private use or custom tags only as a last resort. So that would mean that "roa-x-jerri" is preferred over "x-jerri". —CodeCat 21:33, 28 February 2013 (UTC)
Michael's link, you mean. - -sche (discuss) 09:55, 1 March 2013 (UTC)
  • -x- are extension subtags. I believe we can put anything we want after the x. In my opinion, it would be good to keep that consistent with the public tagging guidelines.
  • qxx are private-use subtags. I hadn’t considered these. Might be a good way to keep everything formatted consistently, and be able to make an orderly transition if and when a language is added to ISO 639.

 Michael Z. 2013-02-28 22:00 z

    • I think you are getting them mixed up. Private use subtags use "x", while extension subtags use any single-letter tag other than "x". However, I think I was mistaken about "x-gem-pro" not being valid, considering: The single-character subtag 'x' as the primary subtag indicates that the language tag consists solely of subtags whose meaning is defined by private agreement. For example, in the tag "x-fr-CH", the subtags 'fr' and 'CH' do not represent the French language or the country of Switzerland (or any other value in the IANA registry) unless there is a private agreement in place to do so. This really just confirms what we had already figured out though: in the code "x-gem-pro" the meaning of "gem" and "pro" is not defined, while in "gem-x-pro" the meaning of "gem" is "Germanic language" but "pro" remains undefined. As far as the standard is concerned, "x-gem-proto" and "qgp" mean exactly the same thing; that is, nothing. They are both reserved for private use. —CodeCat 23:09, 28 February 2013 (UTC)
    • Oh, and if it is useful, here is a full list of all valid subtags, including region, script and transliteration subtags. If we want to follow the HTML standard, then this should be our holy scripture considering anything related to language or script codes. Anything not in that or in BCP47 should not be on Wiktionary either. —CodeCat 23:22, 28 February 2013 (UTC)
Oops, I shouldn’t quote standards in a rush. We can put anything behind the x, but the recommendation is to use the ISO codes like qaa, if they can work, per Considerations for Private Use SubtagsMichael Z. 2013-03-01 01:30 z
Of course, we have to wonder whether they can work for us as well. Having only two letters to use freely will surely cause some codes to be hard to remember. It will make reconstructed languages unrecognisable as such unless we decide that they should all have "p" as the second letter, which would mean we could only have 26 reconstructed languages (I'm sure there are quite a few more than that). However, we could use the private use tags as extended language subtags, if this is allowed. That would lead to something like "roa-qje" for Jerriais, which would be acceptable. Reconstructed languages could adopt a scheme like "xxx-qpr", which would almost be the same as our current codes, except with the invalid/nonsensical "-pro" replaced with valid "qpr". I don't know if private use subtags can be used to extend valid existing primary tags (in the same way that in "zh-cmn", the subtag "cmn" extends "zh"), so we should check this first. —CodeCat 01:39, 1 March 2013 (UTC)
Actually, having read over it again, I think that using extended language subtags is redundant. The standard considers the extended language subtag to be synonymous with its use as a primary language subtag. In other words, "zh-cmn" is considered the same as "cmn". That means that using "gem-qpr" and "ine-qpr" side by side would technically break the standard even if the tags themselves are valid, because we'd be using "qpr" with different and contradictory meanings. —CodeCat 01:49, 1 March 2013 (UTC)
That's not quite right IMHO, but gem-qpr is invalid anyway: BCP 47 only allows qpr as a primary language subtag, not as an extended language subtag. —RuakhTALK 05:03, 1 March 2013 (UTC)
Aren’t the extlangs there only to grandfather a few special codes “for various historical and compatibility reasons?” My understanding is that zh was once the language code for Chinese, but now is considered a macrolanguage. E.g., the extlang code zh-cmn (Mandarin) is equivalent to the preferred language tag cmnMichael Z. 2013-03-01 19:13 z

Also relevant: #mw.language.fetchLanguageName, which accepts ISO language codes. In the lua debug console:



=mw.language.fetchLanguageName("ar", "en")




 Michael Z. 2013-03-21 16:56 z

That's not really terribly relevant. It may be a useful feature for some wikis, but we already have a collection of other data to associate with language codes, like script, family and so on. So if we're going to include those things anyway, we might as well add the language's name to that. Also, have you checked to see whether we have any languages where our own name differs from MediaWiki's name? —CodeCat 17:16, 21 March 2013 (UTC)
Well, I thought it might be useful for someone. I assume this function returns info from the CLDR extension, so maybe it can provide more than just the name. Perhaps its output can also be configured for en.wiktionary.org.language names Michael Z. 2013-03-21 19:43 z
But my point here is that if we use standard language codes, and not create a closed system, then we could be open to unanticipated opportunities. Michael Z. 2013-03-21 19:52 z

Can we please block mobile edits?[edit]

These edits keep showing up but so far not a single one of them that I've seen has been anything other than gibberish. I don't see any value in allowing them at all. And if we do allow them, then we should only allow them for logged-in users who also edit from a regular account. —CodeCat 17:39, 18 February 2013 (UTC)

Are you serious? -- Liliana 17:42, 18 February 2013 (UTC)
Why wouldn't I be? —CodeCat 17:45, 18 February 2013 (UTC)
Oppose. --Yair rand (talk) 17:43, 18 February 2013 (UTC)
Support. Semper, who patrols more anons than any of you, says that he has not seen a single good mobile edit. Nor have I. —Μετάknowledgediscuss/deeds 17:54, 18 February 2013 (UTC)
Erm, I've tried editing on my mobile and can't! I can't sign and there's no edit link, clearly I'm doing it wrong. Mglovesfun (talk) 18:06, 18 February 2013 (UTC)
You need to change to Classic view (scroll all the way to the bottom) and you can log in/edit/etc. just as you would from a pc. On a side note, my biggest problem trying to edit from my mobile is mashing the correct links (I always end up hitting a wrong one, e.g. block instead of edit) which causes me to do things I hadn't intended on doing. I've since given up trying :\ Leasnam (talk) 19:01, 27 March 2013 (UTC)
People have been editing via the regular interface from smartphones for years (remember Luciferwildcat on a Droid (talkcontribs)?) without the "(mobile edit)" tag, so this has to be a new feature that may not be available on every app. Chuck Entz (talk) 22:02, 18 February 2013 (UTC)
I've written whole entries from a normal web browser on a smart phone, and even if it's more work, I don't see that this would be any problem. I think Code Cat needs to explain exactly what a "mobile edit" is and why they are a problem. --LA2 (talk) 17:49, 20 February 2013 (UTC)
I wish we knew enough to clarify. We've been seeing edits for a couple of months that bear the tag "(mobil edit)" at the end of the edit summary. These are mostly new-page-creations of non-words with boilerplate text from the new-entry creator being the only content, or bogus content consistent with someone playing around.
When I first brought this up at the Grease Pit, there was someone there providing information from WMF and/or the developers (I don't remember their exact role/position) on another technical issue, and they mentioned that editing capability had just been rolled out for the mobile interface. Although it's possible that the ease of editing via this interface has attracted vandals who are having fun at our site's expense, I'm beginning to suspect that most of these people aren't aware that they're actually adding this garbage to a real online dictionary. Perhaps they think it's a simulation or some kind of a sandbox. DCDuring (talkcontribs) was able to make one of these edits by using the mobile version of Wiktionary from his computer, so access via the mobile version is established as the only verified source of such edits- but some of us have been speculating that a game or other mobile app may also provide the same access as an extra feature. That would be more consistent with the nature of the edits. Chuck Entz (talk) 01:51, 25 February 2013 (UTC)
  • Arrowred.png For that matter, I'd like some clarification -- we are talking about a proposal to block anonymous mobile editing, right? Or is this about blocking all mobile edits, regardless of login state? -- Eiríkr Útlendi │ Tala við mig 18:33, 20 February 2013 (UTC)
Virtually all of the bad edits have been IPs, and CodeCat has been discussing this as if it would apply only to IPs, So I'm pretty sure it's anons only. Chuck Entz (talk) 01:55, 25 February 2013 (UTC)
Here's all the mobile edits for the past 30 days that haven't been deleted: [18]. Reverted edits to existing entries show up here, but deletion of an entry removes all its edits from recent changes, so this method won't show the dozens of deleted bogus entries that are the main problem. As you can see, there are a few edits that aren't obvious garbage, so you can't categorically exclude the possibility, though the percentage is pretty small. By the way, DCDuring's "mobile" edit was done using a PC, so it probably doesn't count. Chuck Entz (talk) 21:47, 18 February 2013 (UTC)
From what I can see, there are only 3 legitimate edits: by Krun, DCDuring and an IP. Given those statistics, it doesn't seem like such a bad idea to block IPs from making mobile edits. Sorry for that single legitimate IP. —CodeCat 22:02, 18 February 2013 (UTC)
I wonder if there's a connection between these and the complaints on Feedback that seem to assume that we're connected with various word games. Is there some mobile app out there that's giving the impression that users are just tinkering with a feature of the game environment, when they're really editing Wiktionary? Chuck Entz (talk) 18:16, 18 February 2013 (UTC)
I've not seen any edit link on the 'official' wiktionary app. I have never had much luck using a browser to try to edit from my phone. DCDuring TALK 20:23, 18 February 2013 (UTC)
Judging by the ratio of bad to good, I'm in favour of disabling mobile edits from IPs. (Of course, keep them available for named accounts.) It's unfortunate. Equinox 21:46, 20 February 2013 (UTC)
If we do this — and maybe we should — IMO we must build temporariness into it so that overturning this decision, removing the block to editing, requires no consensus (perhaps even no majority): as more people go mobile, we'll have more good mobile edits. Moreover, we'll need to keep half an eye on the filter that blocks them, to know when to remove the block. Incidentally, we have precedent for this: we've blocked all anonIP edits from AOL — and that was an actual block, not a filter, so we couldn't even tell what edits were being attempted, to know when to remove the block due to good edits.​—msh210 (talk) 15:27, 25 February 2013 (UTC)

I've created a vote: Wiktionary:Votes/2013-03/Disallow mobile edits by unregistered users. Please comment. —CodeCat 17:41, 15 March 2013 (UTC)

[short explanation on how to access in mobile view and make a "mobile edit"]

Mobile view can be seen by accessing Wiktionary on the m subdomain: http://en.m.wiktionary.org. Editing in this view can be done by accessing articles through the index.php file with edit as action. For example to edit this page in mobile view visit http://en.m.wiktionary.org/w/index.php?title=Wiktionary:Votes/2013-03/Disallow_mobile_edits_by_unregistered_users&action=edit.

--biblbroksдискашн 15:36, 16 March 2013 (UTC)

I’m confused. Would you please follow up at Wiktionary_talk:Votes/2013-03/Disallow_mobile_edits_by_unregistered_users#Lack_of_good_edits? Thanks. Michael Z. 2013-03-16 17:17 z
Here's what you get when a search is unsuccessful in the mobile version: [19]. It doesn't explain things very well: in the full version there's a red-lettered copy of the search term, with an explanation about creating an entry by clicking on it. In the mobile version it just says:
"There were no results matching the query."
"These entry templates may help when adding words:"
It doesn't really explain that clicking on anything will create a stub entry for the search term, it just has a bunch of things to click on. One is tempted to just click something to see what happens- which is probably where all the bogus edits are coming from. Chuck Entz (talk) 07:18, 18 March 2013 (UTC)

Wiktionary and Wikidata[edit]

Jan Dudík has unofficially started a page on Wikidata for Wiktionary: Wikidata:Wiktionary. Might be worth taking a look at. --Njardarlogar (talk) 19:41, 18 February 2013 (UTC)

Beyond interwikis, I'm really not sure Wikidata can help us much, but thanks for the link. The pronunciation idea detailed on the page looks like it would be a serious mistake here. —Μετάknowledgediscuss/deeds 20:55, 18 February 2013 (UTC)
Which problems do you see with the pronunciation? With Lua coming, local adaptions are now easier than ever. --Njardarlogar (talk) 23:21, 18 February 2013 (UTC)
CodeCat summed it up well at wikidata:Wikidata talk:Wiktionary#IPA. And that's just one small example of hundreds of undocumented discrepancies. Each Wiktionary is autonomous to a greater degree than even Lua can solve. —Μετάknowledgediscuss/deeds 23:59, 18 February 2013 (UTC)
  • To be honest, I have no clear idea what that page is attempting to propose, but what little I do understand of it makes me shudder at the complexity and probable ugliness of the proposed solution -- which, as best I can tell, would be somehow collapsing all separate-language WT entries for individual words into one dataset for each word. Yerg. -- Eiríkr Útlendi │ Tala við mig 21:56, 18 February 2013 (UTC)
    • I can see having a master index, with just the name of the entry and the name of the wiki, but even that would be pretty big. The overhead for keeping it current wouldn't be trivial, but if all the interwiki bots were updating just the index and not all the wikis, it would be actually less than the status quo.
Ideally, you would want to have such a database updated automatically by the systems at the wikis themselves: every time an edit would be saved that created, deleted or moved an entry, as well as any adding or removing a redirect for an existing entry, the system would add or delete an entry in the database if necessary. That would seem to be a major change in the interaction between projects, though, and I expect would require implementation by the developers. Chuck Entz (talk) 22:19, 18 February 2013 (UTC)

I should repeat that the page is unofficial (some of the most recent, vague statements by devs and/or official people can be found here), and that how Wikidata is supposed to work for projects other than Wikipedia is still under development. So the exact shape of the software and the things that devs want it to support may end up looking very different from the stuff on the page I linked to. I don't know how much influence we can expect to have on the development of Wikidata for Wiktionary, but it can't harm to voice opinions on the matter. --Njardarlogar (talk) 23:21, 18 February 2013 (UTC)


I've been managing WOTD for a while now. Unfortunately, it is no longer possible for me to do so due to time constraints. There are currently new words scheduled until March 1, which I hope is enough of a buffer to allow someone else to take the reins. Astral (talk) 20:40, 19 February 2013 (UTC)

What all is required? Leasnam (talk) 19:07, 27 March 2013 (UTC)
I wrote up an overview here. Basically, you plug words that people have nominated into the daily templates, add the was-wotd template to the entries, and add old words to monthly archives. As a failsafe, if you forget to set a day (or if no-one has time to set days), the system just uses the last year's word(s), like it's been doing. - -sche (discuss) 19:42, 27 March 2013 (UTC)

Stripping extra info from Japanese romaji[edit]

Question book magnify2.svg
Input needed: This discussion needs further input in order to be successfully closed. Please take a look!

Basically asking if Category:Japanese romaji can be made like Category:Mandarin pinyin. The Japanese rōmaji is perhaps the only foreign language category that uses romanised words as full-fledged entries. They are only meant to disambiguate and help find proper Japanese entries written in what Japanese script is - kanji, hiragana, katakana. The few Japanese words that are written in Roman letters shouldn't be affected - perhaps some abbreviations like JR.

Compare this Mandarin entry - biǎoshì (no part of speech), there are occasional unwikified inline translations.


# {{pinyin reading of|表示}}

to Japanese akegata:



# {{ja-def|明け方}} [[daybreak]]

The problem with the Japanese rōmaji entry is that "akegata" is not a Japanese word or a Japanese noun, it's only romanisation but 明け方 and あけがた are Japanese words, hiragana being the alternative spelling. --Anatoli (обсудить/вклад) 02:54, 21 February 2013 (UTC)

<a name="WyangSuggestion"/>I'd support stripping anything but the links to lemma forms from non-lemma forms for Japanese. (I'd also support changing the guidelines on lemma forms, use kanji as the lemma form for kango, hiragana as the lemma form for wago (hence moving kun-yomi sections to hiragana forms), katakana for gairaigo; treating juubakoyomi and yutouyomi as kango, gikun and jukujikun as wago, etc.) Wyang (talk) 03:16, 21 February 2013 (UTC)
  • Arrowred.png For romaji entries, I believe what Anatoli describes is already the operating assumption for JA editors such as Takasugi-san, Haplology, and myself -- i.e., that rom entries should only be disambig entries pointing to the lemma forms, with a short gloss to help the user pick the correct entry.
Arrowred.png Regarding Wyang's suggestion, I think the trend here at EN WT for JA entries has been to broadly follow JA dictionary trends when choosing what to use for the lemma form. That said, a lot of JA electronic dictionaries use different redirection schemes to allow for some fancier kinds of entry consolidation than what is technically possible here using the MediaWiki software.
For instance, entering 付く (tsuku) into the search bar of my electronic Shogakukan dictionary redirects me to an entry using all variations of 付く, 着く, 就く, 即く, and 憑く for the header. This does *not* include kanji spellings for semantically unrelated forms of tsuku, such as , or , or 突く衝く撞く搗く舂く築く吐く, or 尽く, or , or 漬く, or 点く.
For my part, I would somewhat support moving all wago entry details to use the hiragana as the lemma form, provided that: 1) all non-lemma entries (such as kanji headers for wago terms) use {{ja-def}} (or some similar mechanism) to refer the user to the lemma entries, 2) the full entry lists all traditional kanji spellings as alt forms (this probably should not include rare corner cases, except maybe in a collapsible table so as not to clutter things up too badly), and 3) the full entry clarifies which senses are spelled with which kanji.
My main reservations about doing so are due to the limitations of linking. If a user arrives at the 付く page and is redirected to つく, our current linking scheme does not allow us to reliably link to any specific sense on the つく page, leaving it up to the user to locate the correct senses. For large pages (and this hypothetical wago page for つく would be quite large, comparable to EN get), this becomes a real usability concern, as the relevant senses might not even be on-screen when the user arrives on the target page. We can link to つく#Noun or つく#Verb, but for any specific POS sense, we don't seem to have any robust mechanism for linking.
Does anyone have any reliable resolution to this issue? One radical thought would be to group semantically related senses on subpages, such as つく/月 and つく/付く・着く・就く・即く・憑く etc, and have those subpages (or relevant portions thereof) transcluded into the main つく page. Any other ideas? -- Eiríkr Útlendi │ Tala við mig 18:37, 21 February 2013 (UTC)
Eirikr, although I agree with you (we can discuss your issues at a different place), you misunderstood the purpose of the discussion. I'm proposing to strip romaji entries of the noun, verb etc. status. They are unnecessarily boost the numbers and mislead users. Words written in romaji are not Japanese words. Although they are the disambig pages, I propose to go further. Something like this:


# {{ja-def|あけがた}} daybreak
# {{ja-def|明け方}} daybreak

That should only add to Category:Japanese romaji, no other category. The rationale was described well on the Pinyin vote but I can't find it at the moment. --Anatoli (обсудить/вклад) 22:25, 21 February 2013 (UTC)

  • Ah, yes, I had not understood the extent of your suggestion.
Thinking it through, I'm generally okay with this, although I would be sad about the loss of POS information in the romaji pages themselves -- given the enormous number of homophones in Japanese, part-of-speech can be helpful in choosing which lemma entry to click on.
Another potential concern is whether users, who might not comment here on the boards, are making use of the POS-organized romaji indices for Japanese. If that's a useful feature, we might want to keep romaji entries in their current structure, even though, for instance, “kangaeru” technically isn't a Japanese verb as we define it. Though I have absolutely no idea how we'd go about gleaning such information -- do we have any stats on how various pages were accessed, and by whom? (By login ID or IP; at the bare minimum, number of unique accesses would be helpful even without any identifying info.)
Otherwise though, I have no strong objections, provided that this can be implemented consistently. It'd be helpful if you could find and link to that Pinyin vote, for reference sake. Cheers, -- Eiríkr Útlendi │ Tala við mig 23:07, 21 February 2013 (UTC)
Here's the vote Wiktionary:Votes/2011-07/Pinyin entries. The only opponent of the vote - Dan Polansky actually used the Japanese romaji as an argument against simplifying pinyin entries. Simplifying the template for romaji will even make it easier for bots to create entries or add new Japanese definitions and for people to work with. When a romaji entry is created it can be forgotten, maintained perhaps when a new Japanese spelling appears. (In Mandarin pinyin entries without ANY hanzi entries were disallowed but it doesn't have to be the same for Japanese - one may want to create akaramu before 赤らむ or あからむ). PoS and other info should be contained in the entries. I don't think romaji should be sorted the same way that kana/kanji entries are. Mandarin also has a huge number of homophones, pinyin entries help to find the right ones. Not splitting Romaji entries into parts of speech will make adding new definitions much easier. --Anatoli (обсудить/вклад) 00:51, 22 February 2013 (UTC)
I have just created a template "Template:ja-romaji" and "akaramu" using the template for your consideration. --Anatoli (обсудить/вклад) 00:58, 22 February 2013 (UTC)
I have made a lot of those romaji entries and every time I felt a little silly doing so because as Anatoli said, they are not really Japanese, and they don't add any new information. A romaji entry just sums up information that should be more fully entered elsewhere. This feature is nice but it should be done by automatically by a database after Wikidata is extended to this project. The romanization (actually "a" romanization since there are different styles) should be a property from which lists are generated, the lists in this case replacing human-written romaji entries. It's not quite as clear-cut with hiragana and katakana, but basically those pages should be made by a database too.
When that point comes, and somebody has to decide how those pages should be formatted, I would do the same thing with Japanese romaji that is done with Mandarin pinyin, in the style of akaramu, because romaji are not real Japanese words. In the meantime, I suppose bots can suffice, and as also noted above, "Simplifying the template for romaji will even make it easier for bots to create entries or add new Japanese definitions and for people to work with. " --Haplology (talk) 02:43, 22 February 2013 (UTC)
I strongly agree with that, but with one caveat: I don't expect WMF will provide us with a reasonable DB structure, via Wikidata or otherwise, in the near future. We just can't expect the kind of technical support that would most help us. In the mean time, we don't even have bots creating romaji entries, although we should. I think that Anatoli's proposal is good and that we need to try to make these kinds of improvements without expecting a deus ex machina (2nd def). —Μετάknowledgediscuss/deeds 03:18, 22 February 2013 (UTC)
  • I second that agreement. As Haplology says, I always feel a bit silly creating the romaji stubs, but I also understand how useful they can be given my own memories of struggling to learn Japanese. Reworking romaji stub formatting into something consistent and simple, and most of all *automatable*, would be lovely.
Regarding data handling, I wonder if it would be possible to simply use the romaji info on lemma entries as the "attribute" data, as it were? Kana entries allow for very easy romaji entry creation using just a simple regex, even if the entry itself gives no romaji spelling, but the kanji entries would need more data, and the rom= param in the headline provides that in our current setup. -- Eiríkr Útlendi │ Tala við mig 06:21, 22 February 2013 (UTC)
  • Arrowred.png I've just had a look at akaramu. I'd strongly suggest that the kana spelling be placed in the headline as it is at older JA romaji entries like sakura, and *not* be listed as an independent entry, as {{ja-def}} should ideally point the user to just the lemma form.
If 赤らむ and あからむ were independent lemmata, then fine, each should have a def listing. However, a kana entry such as あからむ would generally be just a soft redirect disambig pointing the user to whichever lemmata there are, which in this case and in our current setup would be only 赤らむ (pursuant to any decision vis-à-vis Wyang's suggestion above).
Hiragana (and/or katakana) renderings can be auto-generated from the romaji based on regex rules (which I'm assuming that Lua would be sufficient for?), so the user wouldn't even have to enter that as an argument. -- Eiríkr Útlendi │ Tala við mig 06:36, 22 February 2013 (UTC)
Eirikr, feel free to add "hira" parameter to Template:ja-romaji or should the hiragana appear in Template:ja-def? What if a romaji entry is for a katakana word? --Anatoli (обсудить/вклад) 06:53, 22 February 2013 (UTC)
For now, until we can really get Lua going, I suspect you're right and we should just have hira and kata params that users fill in, supplying either or both of these values as appropriate. tabu, for instance, would take both as there is the English-derived タブ (tab, as in “a pull tab”), and also native Japanese verbs 賜ぶ給ぶ食ぶ (a superior giving to an inferior, largely obsolete, root of verb taberu “to eat”) or nouns (Machilus thunbergii, a kind of laurel) or (topknot; bun). Meanwhile, tsuku would only take the hira param as there are no borrowed terms of this reading, and kōhī would only take the kata param as there are no native words of this reading.
Would that be acceptable? -- Eiríkr Útlendi │ Tala við mig 07:20, 22 February 2013 (UTC)
That's perfectly acceptable for me. --Anatoli (обсудить/вклад) 09:43, 22 February 2013 (UTC)
Is everyone happy to make it a policy? I'm happy with the current format of Template:ja-romaji (thanks Eirikr for adding the paramaters!) and the sample entry - akaramu. We can then ask botwriters to help us. I suspect that existing entries will need to be converted to the new format over time. --Anatoli (обсудить/вклад) 03:26, 26 February 2013 (UTC)
I'm okay with making this policy, in the absence of any romaji POS index usage stats. What about additional cats, though, like at kōhī? -- Eiríkr Útlendi │ Tala við mig 03:45, 26 February 2013 (UTC)
I suggest no to have any category other than Category:Japanese romaji. Bots and people will only need to know what Japanese writing romaji should link to. The simpler, the better. コーヒー is a beverage in Japanese, kōhī is only the romanisation of the Japanese word and a tool to help users who can't read or read well in Japanese. --Anatoli (обсудить/вклад) 12:41, 26 February 2013 (UTC)
  • Heya Anatoli, I thought you said that romaji entries would still have POS information, just not the cats? If so, why this change? In addition, keeping mostly the same structure as the kana entries, just with different headline templates, makes it easier to auomate romaji entry creation. Somewhat confused, -- Eiríkr Útlendi │ Tala við mig 04:15, 27 February 2013 (UTC)
Sorry, I never said this I suggested the entries to belong to only Category:Japanese romaji as the example entry I have created and you have edited. It has neither PoS headings or categories, exactly as in Category:Mandarin pinyin. The romaji entries leave all the work for the proper kana/kanji entries. You can still duplicate short definitions (one liners) if you copy the definition but in view mode, not edit mode, so that they are not wikified. --Anatoli (обсудить/вклад) 04:25, 27 February 2013 (UTC)
  • Oh, dear -- apparently you meant something different from my understanding when you wrote "PoS and other info should be contained in the entries." I took this to mean that the romaji entries should continue to have POS headers, just using the {{ja-romaji}} headline template instead to avoid POS categorization.
And I'm a bit puzzled why wikification vs. non-wikification matters? Having more links improves user access to information, while removing links reduces ease of use. I notice you've been stripping links from the romaji entries I created earlier, but I don't know why... -- Eiríkr Útlendi │ Tala við mig
So, do you agree with removing PoS - categories and headers or you miss the old way?
Removing wikifications would make users use kana/kanji entries to look up information, not the romanisation entries. Romanization entries act as soft redirects or indexes, not as proper dictionary entries. The short definition only helps to choose the right kana/kanji entry. --Anatoli (обсудить/вклад) 05:02, 27 February 2013 (UTC)
Stripping POS cats I have no real problem with, in the absence of data showing that users were actually utilizing the romaji-based POS categories to click through to entries. Categories strike me as "extra info", relevant to the header of this thread.
Stripping POS information *entirely*, however, I think is a mistake -- part of speech does not at all seem to me to be "extra", and seems instead to be integral and vital information.
As I've experimented with different romaji-based entries, I've discovered that the sample entry at akaramu is unfortunately not very representative. For starters, there's only one Japanese term that could be spelled this way in the Latin alphabet. An entry such as tsuku or mori (incomplete as of yet) or even ryūto, i.e. entries with multiple homophones, would be more illustrative of one clear issue -- as the number of possible lemmata increases, POS information becomes more useful for determining which lemma is the one the user actually wants. Moreover, for casual users, a romaji page that provides a simple gloss and easy-to-see POS information might itself suffice.
Our goal at Wiktionary is ostensibly to be informative. If so, supplying only a header saying ===Romanization=== fails at this -- the user probably already knows this is a romanization, as the very entry they either clicked on or searched for is already a romanization. Such a header provides no useful information.
In the same spirit of informativeness, stripping out wikilinks from romaji entries can only be antithetical to our purpose -- of providing relevant information to users as conveniently as possible.
I agree that romaji entries should be stubs, providing glosses to point the users to the lemmata entries. However, if an editor has gone to the trouble of adding wikilinks, why remove them? If a user arrives at the sogi page, I would argue most strenuously that a wikilinked version such as this is more useful and more user-friendly than a bare-bones version such as this. Why force users to jump through hoops? I completely fail to understand why it is preferable to force users to click around and manually search instead of just providing the link right there where appropriate. Including wikilinks, or even just leaving them be when already in place, entails no harm. Meanwhile, removing them actively makes Wiktionary harder to use.
As a thought experiment, consider a casual user who had heard that sogi was some kind of performance style in Japan, and looked up the term here using the romaji spelling, perhaps because they don't know that much Japanese. They arrive at the now-current stripped-down version of the sogi entry, find enough information to be satisfied about what sogi is, but they are now curious about gidayū -- and they have no way of looking this up other than to copy-paste into the search box and hope that works. (NB: it goes straight to the entry when pasting in 義太夫, but not when pasting in gidayū.)
Alternately, suppose a casual user winds up on the sogi page, and wonders what a "wooden shake" is. I myself had no real idea what one was until a few years ago, and might have assumed it had to do with “shaking” as a verb, or that it was a typo for stake. Providing a link to [[shake#Noun]] gives users a quick-and-easy way of expanding their knowledge and of avoiding ambiguity or confusion.
As was probably amply illustrated in an earlier thread, I feel very strongly about usability, and I'm afraid that de-wikification of this sort makes me very concerned about Wiktionary's user-friendliness, and ultimately, its usefulness. I have no intention of demanding that everyone add links all over the place. However, if another editor supplies links, and these are pointing to the proper entries in a relevant way (i.e. not misleading users with links such as [[something_profane|happy happy joy joy]], or linking irrelevancies like "cauliflower: a vegetable of [[the]] Brassica family"), then I would request that those links not be removed without good reason. There is nothing useful to be gained by making Wiktionary more difficult to use. -- Eiríkr Útlendi │ Tala við mig 07:42, 27 February 2013 (UTC)
  • I oppose including unwikified defs. Defs should either be omitted completely (my preference), or else they should be full defs with as much information as we can muster. The problem with unwikified defs is that no user will ever guess that this is not the "real" def, and that they can get more information (or at least links) elsewhere. —RuakhTALK 15:33, 27 February 2013 (UTC)
It is my preference to omit definitions altogether as well. That was also the outcome of the vote - Wiktionary:Votes/2011-07/Pinyin entries. In practice though, some or many pinyin entries have unwikified definitions (to encourage users to use hanzi entries and discourage development of pinyin entries). Romaji entries duplicate info given in kana/kanji entries, I still don't see why Japanese romaji are treated as words. We don't have romanization entries for any other non-Roman language. I see the benefit in having romaji entries to provide look-ups and indices like in commercial Japanese dictionaries, though but commercial dictionaries don't duplicate info in indices.
I don't want to make Wiktionary more difficult to use but I don't think it's correct to have romaji entries as if they are Japanese words.
If we provide PoS headers, then editors and users will wonder, why they don't belong to categories.
I might leave the discussion for a while. I haven't expressed the rationale for simplifying the romaji entries well and it hasn't generated much interest from other Japanese editors and judging by Eirikr's reaction, we might leave the entries as they are. --Anatoli (обсудить/вклад) 22:32, 27 February 2013 (UTC)
  • I'm happy to hold off on editing romaji entries for a while until we get more feedback from other editors. Heavens know I have plenty other things I should probably be doing with my time.  ;) -- Eiríkr Útlendi │ Tala við mig 22:49, 27 February 2013 (UTC)
This discussion has become very hard to follow but I want to support writing romaji entries using the header Romanization rather than parts of speech like Noun for a few main reasons:
  1. Practical: It's much more work to add a section for each part of speech. It adds a big burden on the editors which means it's less likely to be done well or done at all. There are quite a lot of entries which list nouns that can also be adjectival nouns or suru verbs, but for whatever reason they never got added. If these romaji entries are vastly incomplete it negates whatever advantages there would be for doing it in such a way that results in their being incomplete.
  2. Theoretical: ringo is not a Japanese noun. It's a romanization of a Japanese noun. Listing ringo under a heading that says Noun which is in turn under Japanese asserts that it is a Japanese noun, which is untrue.
  3. For Consistency: Japanese is the only language that does it this way on this site. Mandarin is a kind of a big brother to Japanese on this site as it has more entries so Japanese should follow their lead, and they even had a proper vote. --Haplology (talk) 18:14, 13 March 2013 (UTC)
Thanks for your feedback, Haplology. What do you think about example entries and the template: "Template:ja-romaji" and "akaramu"? What about inline definitions? How much information should be given for each kanji/kana term? The pinyin vote didn't allow for definitions, only hanzi but then short, unwikified (with no links) definitions were still added. I saw your edit on aidoru. In my opinion, the reference and "see also" could and should go to アイドル when it's created. My "mania" or passion about stripping Japanese romanisation of the noun, verb, etc. status, is because we are giving the wrong idea to users that rōmaji is an alternative script for Japanese. It's almost impossible to keep romaji, kana, kanji entries in sync and it's a maintenance problem and a waste of time.
If you open Category:Japanese_nouns or other PoS category, there are thousands of romanised entries. It's wrong. Although the topic is now hard to follow, my reasoning is about the same as Haplology. The details could be further discussed but we won't proceed without Eirikr's agreement. --Anatoli (обсудить/вклад) 22:39, 13 March 2013 (UTC)
Practicality, fine, but I can't say I agree with your theoretical reason. Both ringo and 林檎 are representations of Japanese nouns. You're trying to thread a difference I don't see at all.--Prosfilaes (talk) 00:38, 14 March 2013 (UTC)
林檎, りんご and リンゴ are all Japanese nouns (not representations) - all spelling are acceptable by the Japanese - kanji, hiragana and katakana but "ringo" is not a Japanese noun, it's the Japanese romanisation. The Mandarin entry "píngguǒ" unambiguously says that it is a romanisation, not a noun but 蘋果 and 苹果 are Mandarin nouns in two acceptable forms. Further, we don't have entries like "tuffāḥa" (Arabic), "jábloko" (Russian), "tapuakh" (Hebrew) or "sagwa" (Korean). I'm not saying we shouldn't, they could help people not familiar with native script or not able to type in them but entries shouldn't mislead users into thinking they see alternative native scripts (in Latin alphabet). --Anatoli (обсудить/вклад) 00:54, 14 March 2013 (UTC)
I agree. The romanization entries for Gothic are even more sparse, they just include a link: wisan, wulfs, þuk. —CodeCat 01:10, 14 March 2013 (UTC)
Words aren't real. They're platonic ideals that exist in our head. I don't buy that 林檎 is a Japanese noun any more then the picture at right
Holand cockade.svg
is an orange. For one thing, 林檎 is also a representation of a Min Nan noun. When I write "A man bit a dog", "А ман бит а дог", or "N zna ovg n qbt", all three are ways of communicating the same sentence to an English speaker; one is more traditional then the others, but they're all representations of English sentences. I don't buy that ringo is not a Japanese noun and 林檎 is, any more then I believe that dog is an English noun and qbt isn't. Which is not to say anything about the practicality of labeling "ringo" as a Japanese romanization, if you believe that will be maximally useful to editors and users. I simply don't believe your theoretical justification.--Prosfilaes (talk) 08:05, 14 March 2013 (UTC)
  • I have a much more in-depth reply I was working on yesterday, but that needs to be reworked some in light of folks' comments above that appeared mid-draft. For now, I'd like to address one narrower issue: providing more than just a link to the lemma form.
  • Arrowred.png CodeCat, others --
I think there's some confusion about the shape of the problem space and the kinds of issues afoot when it comes to romanized entries for Japanese and Chinese.
Comparisons to romanizations for Gothic, Russian, Greek, and any other language written primarily in an alphabetic / abugidic / abjadic / syllabic script are wholly irrelevant when discussing Japanese or Chinese.
This is because such scripts can be mapped more or less one-to-one to Latin letter spellings.
This is not possible for Japanese or Chinese.
By way of example, if I type tsuku into the search bar of my handy JED e-dictionary on my phone, I get 24 possible lemmata. My Shogakukan dictionary software on my PC gives me 21 possible lemmata; if I expand my search to include the EDICT kanji dictionary, I get seven more. These are:
Not every word in Japanese has this many homophones, but it isn't all that rare either for a given kana spelling or romanization to have several corresponding lemmata. Looking up saku in Shogakukan shows a list of 26 possible lemmata; mori gives me 7; ki gives me a whopping 50.
When there is only one possible lemma for a given romanization, as with Gothic, it is certainly enough to just direct the user to the corresponding lemma page; a gloss here would not really serve much purpose. However, as the number of possible lemmata listed as lemmata links rises to what we find under many kana or romaji entries (and those two should be nearly identical, as I will argue in better detail in a later post), giving the reader POS and gloss information is vital to usability. Granted, my local Shogakukan e-dictionary app only shows me the list -- but that's a local app, and I can get immediate results by clicking on each listed entry. The whole list of hits also remains visible the whole time, so clicking on the next entry is easy and immediate. By comparison, my JED dictionary app on my phone shows me the list with a small line under each lemma form giving me POS and gloss information. In contrast, Wiktionary is a web app, and as other threads have pointed out, our page-load times could hardly be described as instantaneous. Requiring users to click through to each individual entry to find the one they want is unacceptable just from that standpoint alone, leaving aside the issues of download data volumes, or that the list of lemmata disappears every time a user clicks through to any single lemma entry.
I must therefore strongly oppose any move to strip romaji and / or kana entries of POS and gloss information. When there is only one corresponding term, as with the akegata example given at the start of this thread, this is less of an issue, but unfortunately that example is not representative of a very many Japanese entries. Without additional information given for each linked lemma form, these one-to-many Japanese entries would be less than useful. A possibly helpful comparison would be Wikipedia disambig pages. A Wikipedia disambig page that gives no description of the links pointed to would not serve its purpose very well. And disambig pages are essentially what one-to-many Japanese entries are. Users need more than just the link to the lemma forms to make sense of the page. -- Eiríkr Útlendi │ Tala við mig 16:10, 14 March 2013 (UTC)
Thanks for your post, Eirikr. All this disambiguation is very important but why do it in tsuku#Japanese not in つく? I have already mentioned that Mandarin has a huge number of homophones as well but pinyin entries don't give the PoS information, see "biān". I have added new Japanese definitions I could find to つく. I could find only verbs for the moment but other PoS can be added later. --Anatoli (обсудить/вклад) 22:07, 14 March 2013 (UTC)
Blue Glass Arrow.svg A glimmer of a possible solution to what looked like an impasse -- are you suggesting then that romaji entries could be simple (soft) redirects to the kana disambig pages? I would be open to that. Your question "why do it in tsuku#Japanese not in つく?" is absolutely salient and right to the point, and I'm kicking myself for not seeing that option earlier. -- Eiríkr Útlendi │ Tala við mig 22:23, 14 March 2013 (UTC)
That's what I was suggesting right from the start, "つく" is a Japanese spelling, "tsuku" is not! It must have been buried under the long discussion. I'm SO glad that we are on some kind of agreement now. :) Take a look at "sasu" and "さす", this formatting and arrangement is what I think should suit myself, you and Haplology (judging from comments). --Anatoli (обсудить/вклад) 00:35, 15 March 2013 (UTC)
I'd actually prefer it if romaji entries just said something like:


  1. see さす
That's the super-simplest of all, requires the least reworking, and introduces the least ambiguity. The kana entry then presents the user with the full-on disambig page, complete with POS headers and glosses to aid in picking the correct JA term. -- Eiríkr Útlendi │ Tala við mig 01:11, 15 March 2013 (UTC)
That's radical but I'm OK with this (a soft redirect), if there's katakana word, it should link to katakana as well. Let's see what Haplology says. Another template or just like above? --Anatoli (обсудить/вклад) 01:29, 15 March 2013 (UTC)
This is Haplology. Sorry to be delinquent in these discussions. I should say that I've been going through some stuff recently and I've put a lot of time and energy into these entries so besides being preoccupied I've been a little emotional and quick to overreact and I apologize. But to address the questions:
  • A soft redirect like


  1. see さす
would be my preference, because while I have few objections to a entry like Japanese/Romanization/ja-def/short gloss or no gloss, it still suffers from requiring a massive human effort, and besides taking time to write, a shortened gloss is often clumsy and incomplete as it has to combine or ignore multiple senses and, since everything is grouped under Romanization, multiple parts of speech. I tried it with ōwarai and it got nasty, and it even leaves out the verbal form as is. It should be "uproariously funny, or something that is such, or laughing hard." I think that's cramming too much in. The shortened glosses separated by parts of speech on pages like つく have not been a problem in practice. The only problem is keeping them consistent with the romaji entries, which would be solved by using the soft redirect above. The only downside for the user is that it forces them to click through, but it's just one click, it's obvious that they should do it with the soft redirect format, and they will land on a better page which gets more editorial attention. Freeing up editors' time is one of many advantages for them too.
This solution fortunately bypasses questions of whether a romanization of a Japanese word is also a real Japanese word, but as for that: Would anyone say that アップル is a real English word? It's not such a stretch. If history had unfolded differently, all nations might be learning to write with katakana even though they never use it in daily life. My name is Jesse. In Japanese, it's ジェシ. Or is it ジェシー? My bank calls me JESSE. Apparently the shouting does not survive in translation. (Incidentally, speaking of caps, the question of when to capitalize romaji, such as when it is a proper noun, comes up here too.) City hall knows me as ジェシ. If you asked them why, they would reply that this is Japan, perhaps with a degree of defensiveness. If I ever gained citizenship then I would most certainly no longer be Jesse, nor would I be Jeshi, Jesi, jeshi, Jyeshi, jesi, Jyesi, or jyeshi. Everyone in Japan knows romaji to some degree but not everyone writes it in the same way, and it is more difficult and time-consuming for them to read (as opposed to kana which they can read without concentrating.) Using it for real communication outside of a word here or there, usually in advertisements, is inconceivable. It's not "real" in the sense of having currency in the language community. When Roman letters are used, it's often precisely because it gives a foreign touch, and usually those are actual foreign words, not romanizations of Japanese words which have come from foreign languages. Romanizations of native Japanese words are even more rare, and again, tend to be done for the foreign-ness of it. --Haplology (talk) 06:05, 15 March 2013 (UTC)

@Haplology, @Eirikr. I have modified Template:ja-romaji. The templates still adds to Category:Japanese romaji. I think we should keep at least that category. Please take a look at "sasu". Is that acceptable? --Anatoli (обсудить/вклад) 11:26, 15 March 2013 (UTC)

And "ryūto" (which has both hiragana and katakana). --Anatoli (обсудить/вклад) 11:31, 15 March 2013 (UTC)
They look good to me. Before I left out the Romanization header because Eirikr left it out, but not for any other particular reason. Is there any reason to include or omit it? --Haplology (talk) 11:54, 15 March 2013 (UTC)
If we include it, it will look better and similar to Mandarin pinyin (biēsān)and Gothic (wisan) romanization entries. --Anatoli (обсудить/вклад) 12:02, 15 March 2013 (UTC)
I think it would look better too, especially in entries with words in other languages, like pan, where it would stand out without a header. --Haplology (talk) 13:12, 15 March 2013 (UTC)
KassadBot adds a little # {{defn|Japanese}} to these entries as in saigū. I wasn't added to sasu for some reason though. --Haplology (talk) 01:35, 17 March 2013 (UTC)
Thanks, I have replaced "*" with "#" in the Template:ja-romaji. It will put a number instead of a bullet point. Let's see if it helps. --Anatoli (обсудить/вклад) 02:03, 17 March 2013 (UTC)
  • @Anatoli, the new {{ja-romaji}} looks great from an editor and user usability standpoint. Barring any concerns voiced by other editors, I think this thread has reached a successful conclusion. I will tweak the template in moment though to get rid of the vestigial rom parameter -- I can think of no conceivable use for this, and leaving it in the template tempts editors to use it, which does nothing but produce an extraneous comma in certain situations.  :)
@Haplology, your point about script is taken, but as an alternate point of view, イズント・ジス・スティール・イングリッシュ、アフター・エー・ファッション? (izunto jisu sutīru ingurisshu, afutā ē fasshon?) Changing the orthography doesn't change the language -- it just changes the spelling. /aɪ kʊd traɪ ˈspelɪŋ ˈɪŋɡlɪʃ ɪn aɪ pʰiː eɪ̯/, or περχαπς ιν Γρεεκ, or ᛖᚢᛖᚾ ᛁᚾ ᚱᚢᚾᛖᛋ, but it would still be English. Granted, different writing systems have different degrees of expressiveness, and some scripts are ill-suited to expressing the sounds of some languages. But that aside, scripts are scripts and languages are languages -- 雲泥の差 (undei no sa, as different as clouds and mud).  :) -- Eiríkr Útlendi │ Tala við mig 22:51, 18 March 2013 (UTC)
Thanks, only that point of view about script wasn't from Haplology but from Prosfilaes. To me, it didn't sound serious, more like trolling, sorry, I didn't reply to that point of view. We don't write in English in Cyrillic letters or express in picture, etc. Haplology supports this idea, if you read a bit more carefully :). --Anatoli (обсудить/вклад) 23:03, 18 March 2013 (UTC)


Good evening,

Recently someone named X added a lot of reconstructions (log) in the French wiktionary. He used the English pages as references, but those don't have any references themself. Category:Proto-Germanic nouns is one example. A reconstruction is a scientific work and our policy (in the French wiktionary) is to refuse a reconstruction without sources. Why are you accepting this - perhaps - original scientific work?

Then, we discovered your policy on this theme, Wiktionary:About Proto-Germanic, and specifically this sentence:

If a form is not attested in a language, but it can be reconstructed based on a form appearing in a descendant of that language (for example when Old English is unattested, but Middle English is attested), then its reconstructed form(s) may be provided with an asterisk * as is usual for unattested terms.

It appears to be an incitation to innovate and seems to be against the NOP policies. We think it’s a serious issue. For example: entries in Proto-Germanic about names seems to be pure creations, like Appendix:Proto-Germanic/Mērijawīgan. What do you think? Could you take a look on it? Eölen (talk) 20:28, 24 February 2013 (UTC)

No Original Research is a Wikipedia policy, but this is Wiktionary. We have kind of accepted original research as an occupational hazard, because it would be impossible to define words otherwise. This is also explained on WT:WFW. —CodeCat 21:19, 24 February 2013 (UTC)
I understand for the lexical collection but I point out the case of reconstructions, i.e. theoretical forms induced by a logical work. It's different and had to be sourced. I will try to resume the French Wiktionary discussion: you need to do a large scale analysis before the proposal of a list of proto-words. You can't create new ones as in an conlang. It's based on a lot of work, comparison and research. It's name historical linguistics but it's really different from History. It's more mathematical but...hum I'm not sure to be understandable. I will stop here and wait for someone to give a better explanation. Eölen (talk) 23:06, 24 February 2013 (UTC)
If it eases your heart, we do often add references to reconstructions, and if there is any doubt, we discuss them too. They are sometimes removed as well if they seem particularly doubtful. However, many terms in Proto-Germanic are not controversial because all of its descendants agree, so it seems rather wasteful to require reconstructions. Often, linguists don't bother to discuss "easy" reconstructions that everyone already agrees with in their work, unless it's to give an overview of the language and its vocabulary. So paradoxically, a commonly accepted and uncontroversial reconstruction may actually be less likely to be found in sources than the more contentious ones (which probably have more than one possible reconstruction). Such a situation is of course not really useful for Wiktionary, so we often do the easier reconstructions ourselves and try to look for sources on the more difficult ones. —CodeCat 00:05, 25 February 2013 (UTC)

Notice of language-merger discussion at RFM[edit]

People have been, for some time now, posting requests for language mergers to RFM and RFDO even though they're technically policy requests. This post is to notify BP-watchers of that fact and to make sure there's no objection. In order to consolidate the requests, they will henceforth be only at RFM. (This notice/feeler follows discussion at WT:RFDO#Template:tnv and …#Template:pld.)​—msh210 (talk) 03:53, 25 February 2013 (UTC)

I would like to see these in Beer Parlour, for the reason you've stated: they are policy change requests. --Dan Polansky (talk) 18:24, 25 February 2013 (UTC)
I would prefer them in either RFM or, maybe better, a new discussion room just for proposing changes in policy and common practice (as opposed to discussing in general). I imagine such a new room would be for informal votes and straw polls, where the emphasis is on getting consensus for smaller changes where a vote would be overkill. —CodeCat 18:26, 25 February 2013 (UTC)
Another one? -- Liliana 19:58, 25 February 2013 (UTC)
I agree with Liliana.​—msh210 (talk) 21:07, 25 February 2013 (UTC)
I think the BP is good for that. I'd want language-merger discussions here in the BP also, but for the history of their being elsewhere. But as long as they're in one place and the community knows about it (through this discussion, say, and preferably a link to be placed in the BP header), I don't see that it really matters where.​—msh210 (talk) 21:07, 25 February 2013 (UTC)
The biggest reason not to put them in the BP is that BP discussions never have to be resolved, just auto-archived. However, RFM discussions have to be closed one way or the other and manually archived on a talkpage. That makes them much better suited for a request page. —Μετάknowledgediscuss/deeds 03:16, 26 February 2013 (UTC)
This is why I proposed a separate discussion page. One that is specifically meant for issues that have a definite outcome or conclusion. —CodeCat 04:15, 26 February 2013 (UTC)
As long as that separate page is WT:RFM, then fine. Mglovesfun (talk) 09:12, 26 February 2013 (UTC)

Mandarin categories separated by script[edit]

Can someone do something to fix the weird category tree in Mandarin caused by the different scripts? I feel like if someone who knew how to navigate Wiktionary wanted to find Mandarin names for stars they would look up Category:cmn:Stars and only find two entries. However, there are subcategories in simplified and traditional script which bizarrely aren't members of the broader cmn:Stars category. However, its parent category Category:cmn:Astronomy contains all three categories, which is pretty silly, especially next to the categories for Constellations in traditional/simplified scripts while Category:cmn:Constellations doesn't exist. Can't Category:cmn:xxx in yyy script include only the category Category:cmn:xxx? Or we could, you know, just combine them like every other multi-script language. Ultimateria (talk) 22:27, 27 February 2013 (UTC)

Portal pages per language[edit]

This has come up before, and as far as I remember, it was fairly well received but it wasn't followed up on. Basically, the idea is to make Wiktionary more welcoming and easier to navigate by creating portal pages for each language. These pages would explain some about the language (without getting too encyclopedic), and provide links to all the information about the language that we have in a simple and concise way. In a way, a portal would be a kind of "main page" for that language specifically. We could also place notices there for other editors who work in that language. —CodeCat 02:06, 28 February 2013 (UTC)

With a little attention to the body text, the language category pages could serve. See Category:English language, for example. Michael Z. 2013-02-28 03:18 z
While that is true, I feel that all the subcategories clutter up the page... —CodeCat 03:22, 28 February 2013 (UTC)
What about the WT:About Foo pages? - -sche (discuss) 04:59, 28 February 2013 (UTC)
They're for editors, not readers. -- Liliana 05:26, 28 February 2013 (UTC)
Sounds like a good idea. Perhaps they could be linked to from the main page, instead of the indexes. Some things that might be useful for them to have, if possible:
  • A search box that goes specifically to the language's section in the searched entry.
  • The TOC box for the language's index.
  • A button to toggle targeting the language with targeted translations.
  • Quick links/forms for creating new entries, and guides and such. (Speaking of which, we might want to enable Guided Tours here...)
  • Links to appendices on grammar, etc.
--Yair rand (talk) 06:27, 28 February 2013 (UTC)
“Language of the Week” on the main page? Michael Z. 2013-02-28 07:00 z


This currently adds entries to Category:(language) impersonal verbs. But I don't think that is appropriate behaviour for a context label, because by definition they are used only for specific senses. In particular, Category:Dutch impersonal verbs currently contains several verbs like donderen, that are not impersonal in the general sense, but have one or two senses that are used impersonally. I would like to remove them from that category, but the only way is to change this template. So I want to be sure it's ok. —CodeCat 19:32, 28 February 2013 (UTC)

Disagree, I think this is fine and isn't any different from {{uncountable}}, which I also think is fine. Mglovesfun (talk) 19:35, 28 February 2013 (UTC)
Why do you think it is fine? It's simply wrong. donderen is not an impersonal verb and should not be listed as such. Impersonal is a usage context, not a property of a verb. The same applies to dated and archaic, but in those cases one could say that the mere mention of a word, irrespective of its usage, is archaic. Archaic is a connotation. With impersonal usage that doesn't even apply, since there is nothing lexically inherent about a verb that makes it impersonal. Being impersonal is not a connotation of a verb, it is just how it happens to be used. —CodeCat 20:29, 28 February 2013 (UTC)
Isn't the real question whether there is information context to the category? If there aren't too many (or too few), then the category should have some information value. This is the reason why it makes more sense to have a category for English nouns with uncountable senses as not a large portion of English nouns have uncountable senses, whereas almost all English nouns can be used countably. Categories don't really work very well below the level of PoS and are ambiguous for L2 sections with multiple etymologies that contain the same PoSs. DCDuring TALK 20:44, 28 February 2013 (UTC)
That's what's being going on with categories such as Category:English rare terms and Category:English terms with rare senses. Impersonal verbs need not have only impersonal sense, they can have personal ones too (if that's the right term). By the same token, maybe Category:English verbs with impersonal senses and Category:English nouns with uncountable senses would be better titles. But to be honest I'm not for the change as I find it to be bureaucracy with with no practical implications. Mglovesfun (talk) 21:08, 28 February 2013 (UTC)

Cantonese is not Mandarin[edit]

Given that linguists have came to agreement that Sinitic languages are languages, but not dialects of one language, for anyone to merge all articles from different Sinitic languages into one "Chinese" category is completely unscientific. Given that Wiktionary is supposed to be politically neutral and that merging Sinitic languages into one "language" is nothing but political play and blatantly anti-neutrality, I would like to urge everyone to stop this practise right now.

If not, I'll have no choice other than calling for a boycott of the so-called "Chinese" Wiktionary.

Our languages deserves treatment that's equal to the treatment of Mandarin. End of story.

Wg. Cdr. Cédric, ECPP Wing 31, Squadron 4 (talk) 20:46, 28 February 2013 (UTC)

If you think there should be a Cantonese Wiktionary, you should probably go start working on it. See incubator:Help:Manual. This is not the best place to discuss it. --Yair rand (talk) 21:12, 28 February 2013 (UTC)
I'm less than surprised to see, Cedric tsan cantonais, that this is your only ever edit on this wiki. I've been here since about 2007 and Cantonese and Mandarin have been separate the whole time I've been here; Chinese isn't an allowable header. We can't 'stop this practise[sic]' as you put it as we do not employ this practice. You might as well implore us to stop merging Portuguese into Spanish - we already don't do this. Now if you have nothing relevant to say, I suggest you shut the fuck up. Mglovesfun (talk) 21:15, 28 February 2013 (UTC)

[His caps-lock key must be broken.] Michael Z. 2013-02-28 21:18 z

I wonder if he/she read this private discussion on Wyang's talk page to make them think so but nothing is happening (yet)! A few points:
  1. Written formal Cantonese (e.g. as in Hong Kong) is almost identical to written Mandarin, only written in the traditional form, the difference is in styles, older words are used and newly coined Beijing forms are missing. (The concept of "formal" and "vernacular" Cantonese may differ depending on the source, some view a version of written Cantonese - a correct Cantonese, the other being just Mandarin, which be read out either in Mandarin or Cantonese).
  2. The official language of Hong Kong is "Chinese", which causes confusion about whether it's Cantonese or Mandarin.
  3. Theoretically, most Mandarin, Cantonese, Min Nan and other hanzi entries can be merged here, providing topolectal pronunciation and rare usage and sense differences. On the single vocabulary level it is quite possible, as entries don't deal much with grammar.
  4. The vernacular Cantonese sounds very different from Mandarin, not only in pronunciation but also because of some words, which are completely different. Their number is very small but they happen to be the most common, everyday words. Add some ending particles, very few grammar differences. There are also Cantonese specific characters approved by Hong Kong government, in reality, they are often replaced by Mandarin cognates in writing. This issue is solvable. We are really talking about a small number of words.
I'm not suggesting to merge the languages for now. It's even more political than Serbo-Croatian and because of the big differences in pronunciation of identical words it makes Cantonese and Mandarin different languages, even if we're talking about the same piece of written text. --Anatoli (обсудить/вклад) 00:47, 1 March 2013 (UTC)