Wiktionary:Beer parlour/2014/September

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← August 2014 · September 2014 · October 2014 →

Contents

English plural nouns: agreement and countability[edit]

We tell our users what is mostly obvious by inspection, that a noun is plural in form. The only significant information that is provided is that there is no singular form, which information is sometimes false.

I would expect that it would be more useful to a learner of English would be to know what form of verbs was used for agreement in number and whether the noun was countable, which indicates which set of determiners it would be used with.

  1. Apart from the work involved, is there any reason not to try to provide this information?
  2. Is there any reason for this information not to be on the inflection line?

If we can agree to the answers to these simple questions, then it should be possible to emend {{en-plural noun}} and/or {{en-noun}} appropriately. {{en-plural noun}} is transcluded in 1454 times in principal namespace and probably should be used in additional existing entries. This may overlap in some cases with the separate phenomenon of British/Commonwealth English requiring that a noun like team take a plural form of verbs for agreement. DCDuring TALK 15:05, 1 September 2014 (UTC)

Responding to your last sentence first, the British use of "team (etc) are" vs the American use of " team (etc) is" seems like a general grammatical phenomenon, perhaps not worth mentioning in individual words' entries. If one inverts a new word, e.g. the country name "Triceleuden", I expect it will be adapted to the existing grammar and hence a Brit will report sporting news as "Triceleuden face Australia in their next match" while an American will say "Triceleuden faces Australia".
Some entries do mention whether they take singular or plural verbs, in varying ways: feces, data, dramatics, bobby socks, grits. I agree that indicating the "verb number preference" of plural-only nous would be useful. Grits and data show why the information cannot always be on the inflection / headword line, and feces shows how the information may be too long to fit into a sense-line {{label}}. Perhaps we could use sense-line labels and just expand on them in the usage notes when necessary, though. Regalia is an example of a plural-only noun that seems to take singular verbs and plural verbs in even measure. - -sche (discuss) 16:28, 1 September 2014 (UTC)

Chinese Medicine Entries[edit]

There are hundreds of entries for Chinese medicinal preparations that nobody seems to be aware of. They go beyond encyclopedic into a level of detail that Wikipedia doesn't touch. To illustrate the magnitude of the phenomenon, here's the "definition" for 二十五味珊瑚丸:

  1. A reddish-brown pill used in traditional Tibetan medicine to "promote the restoration of consciousness, promote blood circulation and relieve pain, when there are symptoms that include unconsciousness, numbness of body, dizziness, headache, abnormal blood pressure, epilepsy and cranial neuralgia".

Ershiwuwei Shanhu pills have the following herbal ingredients:

Name Chinese (S) Grams
Qingjinshi 青金石 20
Os Corallii 珊瑚 75
Margarita 珍珠 15
Concha Margaritifera 珍珠母 50
Fructus Chebulae 诃子 100
Radix Aucklandiae 木香 60
Flos Carthami 红花 80
Flos Caryophylli 丁香 35
Lignum Aquilariae Resinatum 沉香 70
Cinnabaris 硃砂 30
Os Draconis 龙骨 40
Calamina 炉甘石 25
Naoshi 鱼脑石 25
Magnetitum 磁石 25
Limonitum 禹余粮 25
Semen Sesami 白芝麻 40
Fructus Lagenariae 壶芦 30
Flos Asteris 野冬菊 45
Herba Swertiae Bimaculatae 獐牙菜 80
Rhizoma Acori Calami 白菖 50
Radix Aconiti Preparata 制川乌 45
Herba Chrysanthemi Tatsiiensis 打箭薹草 75
Radix Glycyrrhizae 甘草 75
Stigma Croci 红花 25
Moschus 麝香 2

二十五 is Chinese for 25, and, not coincidentally, there are 25 ingredients listed. Though it's no doubt got more ingredients in the list than most of these entries, there are hundreds of them. DCDuring has added hyperlinks to some of the Latin, but other than that, a look at the edit history shows no one but the creator of the entry and some bots, and that seems to be the norm.

I may be wrong, but I think someone listing the names and amount of the ingredients for various standardized foods or beverages would be reverted pretty quickly, but these have been there for half a decade. The question is: do we want to have different standards for entries that no one cares about? Chuck Entz (talk) 07:35, 2 September 2014 (UTC)

Most of those articles need to be deleted. Only the ingredients themselves warrant inclusion. Wyang (talk) 09:57, 2 September 2014 (UTC)
Other than the fact that the tables are not WT:ELE compliant, what exactly is wrong with the entries? The items on all the tables are meronyms of the headwords, with quantification. I know that EncycloPetey has expressed a linguistic interest in Chinese herbal medicine. I have become convinced that the meronyms are themselves not necessarily SoP as they are often short names for ingredients more specific than the names would suggest, either in the species involved, in the form, or the manner of preparation. The large literature on Chinese herbal medicine makes it highly likely that all of the terms involved, headwords and meronyms, are attestable. The subject matter is basically irrelevant to inclusion.
It seems to me that all of the entries with the tables would be worth some effort to bring into greater conformity with WT:ELE and would otherwise be subject to the same RfV and RfD as other entries.
I would also appreciate views about whether "radix Aconiti preparata", as used in Chinese or English works on herbal medicine, are or might be likely to be entry-worthy or, if not, why. DCDuring TALK 13:41, 2 September 2014 (UTC)
Here is a list of them:
These are not considered words by Chinese dictionaries. Apart from SoPness, problems also include: 1) lack of creator's knowledge in Chinese. Many of these entries have been fixed over the years, but numerous mistakes are still present. For example, this nonsense entry 发育迟缓 ("growth retardation"). 2) incorrect formatting. Most of these fall into Category:Chinese terms with uncreated forms, and have weird formatting errors (eg. nonstandard use of the hanzi box, initial or trailing spaces in Pinyin, hidden characters in title). Wyang (talk) 23:50, 2 September 2014 (UTC)
Which ones should be RfDed? Which ones have something that can be saved (ie, should be RfCed)? I would like to at least harvest the taxonomic species or genera referred to in the tables. The English or "Medical Latin" terms seem attestable and not necessary SoP, as they are used as units. DCDuring TALK 02:28, 3 September 2014 (UTC)

Arabic transliteration module enabled, a minor change needed, verb header template needs work[edit]

(Notifying Benwing, Wikitiki89, ZxxZxxZ, Mahmudmasri): I have just added Arabic transliteration module Module:ar-translit to Module:languages/data2 to allow automatic transliteration, which is now enabled in all templates, which use automatic transliterations. It's not added to Module:links, so the manual transliteration is not overridden. It's a good time to change Arabic verb template {{ar-verb}}. As most verbs use full diacritics and it would be much easier, if the manual transliteration is removed, all missing diacritics checked, when the template starts using the automatic transliteration. It works fine in most cases. We need to make "showI3raab" default to show case and verb conjugation endings. As was previously agreed, we don't need word stresses for Arabic words, so a fully vocalised word نَزَفَ (nazafa) should be transliterated as "nazafa", not "nazaf" (missing iʿrāb ending) or "názafa" (stress mark). (I should mention, if it's not obvious that the module is supposed to be used on a fully or partially vocalised forms, i.e. with diacritics, which are normally unwritten in a running Arabic text). You're doing a great job, Benwing, thanks! --Anatoli T. (обсудить/вклад) 03:28, 3 September 2014 (UTC)

You're welcome.
Can you give an example of templates that use automatic transliterations? In the case of {{ar-verb}}, can you point to a verb where this translation happens? Will it happen if you remove the manual transliteration?
I'm all for making "showI3raab" the default; if no one objects I'll go ahead and do this.
Also (Notifying Lo Ximiendo): I haven't done much with {{ar-verb}} so far. I think it should be rewritten entirely so that it basically takes the same params as {{ar-conj}} and makes use of the same code. That would obviate the need for explicitly writing out the perfect and imperfect verb forms and would automatically supply the right vowels and such. This should be retrofitted into the existing params, which I think is possible. What this means is that form I verbs need only the form and past and non-past vowels specified, and augmented (non-form-I) verbs need only the form specified (the radicals are inferred from the headword; the few cases where ambiguity exists all involve weak radicals i.e. و or ي, and as it happens the call to {{ar-verb}} already specifies the radicals in these cases, e.g. in تسلى where III=و is specified, see below). The form is already present in the call to {{ar-verb}} but the past/non-past vowels aren't; however, this isn't an issue if the verb forms are manually given (which they are, currently), and we can arrange things so that there's a category containing form-I verbs whose call to {{ar-verb}} is missing the past or non-past vowels, so they'll eventually be fixed. For augmented (non-form-I) verbs, I'm thinkingwe should actually ignore the parameters specifying verb forms, because of cases like تسلى, which has a call to {{ar-verb}} declared as {{ar-verb|III=و|form=5|tr=tasallā|impf=يتسلى|impftr=yatasallā}} with missing vowel diacritics. Module:ar-verb will correctly generate the verb forms on its own: it currently handles all "regular" verbs and almost all of the very few truly irregular verbs, and if any cases come up where it doesn't work properly, just fix the module. (A more conservative approach is to check to see whether the diacritics are present and use the manually specified verb form if so.) I don't have time to work on this now, so Anatoli if you're interested in working on it, go ahead. Benwing (talk) 04:49, 3 September 2014 (UTC)
I have just updated نَزَفَ (nazafa) (and the verbal noun), which now uses automatic transliteration ("showI3raab" is currently off), so it now automatically shows "nazaf", instead of "nazafa". Note that imperfect forms only need one parameters - the form with diacritics, |impfhead=يَنْزِفُ is not necessary, neither is |impftr= (inflected forms are not transliterated in the headword. The entry got into Category:Arabic terms lacking transliteration, which should probably be removed from {{ar-verb}}. --Anatoli T. (обсудить/вклад) 05:25, 3 September 2014 (UTC)
Re: templates using automatic transliteration: {{t}}, {{l}}, {{term}}, headword templates, etc. One problem with that is that if a term is missing both manual transliteration and diacritics, it will transliterate incorrectly, e.g. نزف is, as you can see is just "nzf". Various Arabic translations using {{t}} or {{t+}} will now have wrong transliterations. Someone may complain about this. --Anatoli T. (обсудить/вклад) 05:32, 3 September 2014 (UTC)
Actually, the intention is clearly that the imperfect is translated in the headword. The call to {{ar-verb}} passes in the |impftr= param as {{head}} param |f1tr=, but this is (no longer?) supported. I think this intention is correct. Benwing (talk) 05:49, 3 September 2014 (UTC)
(You must have meant transliterated in the headword). It's no longer supported. I posted on the GP discussion you started. --Anatoli T. (обсудить/вклад) 06:01, 3 September 2014 (UTC)
Yeah, sorry, I meant transliterated. Benwing (talk) 06:11, 3 September 2014 (UTC)

Swedish entries give glosses rather than translations[edit]

The entry for malm, for example, has, among other definitions:

(archaic) an alloy consisting of copper, zinc, lead and some tin (archaic) the geological period of late Jurassic (archaic) a hill or ridge consisting of sand or gravel

Unless these are specifically terms for which there is no English equivalent, these should be translations rather than glosses - for example, the first looks as if it should simply be "bronze".

I've looked at several other Swedish entries and they seem to give translations, so this may be an isolated example after all.

Grants to improve your project[edit]

Greetings! The Individual Engagement Grants program is accepting proposals for funding new experiments from September 1st to 30th. Your idea could improve Wikimedia projects with a new tool or gadget, a better process to support community-building on your wiki, research on an important issue, or something else we haven't thought of yet. Whether you need $200 or $30,000 USD, Individual Engagement Grants can cover your own project development time in addition to hiring others to help you.

Four RfV topics to clear 2013.[edit]

If we can settle the four oldest RfV issues, we'll have 2013 cleared off of that board. Any takers? bd2412 T 20:01, 3 September 2014 (UTC)

returning nil in Module:ar-translit when vowel diacritics not available?[edit]

Now that we've turned on automatic transliteration for Arabic, one issue is that not all words have the vowel diacritics supplied, leading to incorrect transliterations. One possibility is to check for this, and return nil when encountering a word that isn't completely vocalized (with an exception made for vowels omitted on the last letter of a word). Some questions:

  1. Will this work? What happens in general when a transliteration module returns nil?
  2. Is this a good idea?
  3. Is this the right place to ask a question like this (apologies to Wikitiki89)?

Benwing (talk) 05:18, 5 September 2014 (UTC)

Maybe @Wyang: can help? You made the Korean transliterate hangeul 공부하다 (gongbuhada), while hanja is not transliterated: 工夫? (工夫하다 (工夫hada) should also be nil, IMO)
Nil shouldn't transliterate anything, like with language for which automatic transliteration is not enabled, e.g. Hindi, Hebrew, if there is no manual tranliteration. Most Arabic entries have manual transliterations, they shouldn't be removed, before vowel diacritics are added, that's all. --Anatoli T. (обсудить/вклад) 05:39, 5 September 2014 (UTC)
This probably should be a WT:GP question. But to address the question itself, I was actually thinking that we should do the exact same thing. User:CodeCat would know if it does what we want it to do. I was thinking further that if any letter where a diacritic is expected does not have one, then we should return nil. For example بَني would return nil because a diacritic is expected on the ن to distinguish بَنِي from بَنَيْ, and دَم would return nil because the iʿrāb is not specified. But لَا would not return nil because no diacritic is expected on the ʾalif. --WikiTiki89 19:32, 5 September 2014 (UTC)
My thought was to allow final consonants without iʿrāb (which is frequently omitted in otherwise properly vocalized nouns), and to allow a few other cases where things are unambiguous even without diacritics -- specifically, alif or tā' marbuṭa with missing fatḥa before it. This is not intended to encourage people to write things like this but to handle existing usage like كاتِب, which is written as such in the كاتب entry and can be unambiguously transliterated as kātib. (Sorry if this is drifting into an Arabic-specific discussion again.) Benwing (talk) 20:52, 5 September 2014 (UTC)
But doing that would be a way to catch all existing cases and fix them. --WikiTiki89 21:31, 5 September 2014 (UTC)
This is true. I guess it comes down to a compromise between current usefulness and usefulness in fixing. Since I don't see any people currently offering to go fix all the (thousands of) existing cases needing fixing, I'd rather have the iʿrāb-less transliterations there. If someone wants to fix the iʿrāb, they can edit Module:ar-translit and temporarily comment out the lines that allow iʿrāb-less transliterations (there's a comment indicating where to do this). Ideally there would be a way of allowing iʿrāb-less transliterations while still marking them but I don't see how to do it.
BTW in case it's not clear I did implement returning nil on unvocalized text. Hopefully I did it correctly. So far all the places I can find that should have transliterations do. Benwing (talk) 08:07, 6 September 2014 (UTC)
If a transliteration module returns nil, it's equivalent to having no transliteration module at all. —CodeCat 19:34, 5 September 2014 (UTC)
Can we somehow tag it with a category such as Category:Arabic terms lacking transliteration? --WikiTiki89 19:45, 5 September 2014 (UTC)
How would Module:headword know that a transliteration is needed? —CodeCat 20:37, 5 September 2014 (UTC)
Any time a call to the transliteration module returns nil, it should be inserted into a category indicating this, perhaps Category:Arabic terms lacking vocalization; presumably this would be language-specific (for Arabic, maybe Hebrew as well and other languages using Arabic script) or script-specific. Module:links should also do this; it in fact already inserts categories like Category:Terms with manual transliterations different from the automated ones. Benwing (talk) 20:52, 5 September 2014 (UTC)
I think Category:Arabic terms lacking transliteration makes more sense, since it would be difficult for Module:headword to infer the reason for the transliteration failure. This would only apply to Arabic, not other Arabic-script languages, since most of them do not have vocalization systems. --WikiTiki89 21:31, 5 September 2014 (UTC)
I think we should be able to make this more general. After all, we really want transliterations for any language not written in Latin script, right? Whether it's generated or manually supplied is not even relevant, as long as it's there. So maybe we could apply this general rule: if the term is written in a script that is not Latn, Latinx or varieties, then if there is no transliteration, add a category to request one. That way we don't have to make it specific to Arabic. —CodeCat 21:50, 5 September 2014 (UTC)
Did I ever say "Let's make it specific to Arabic so that other languages can't take advantage of this useful feature."? Anyway, jokes aside, I agree, if there is no transliteration then it should be placed in a category, whether it's a headword or just a link. But, I think that if {{{2}}} is specified, then the category should not be added, and we need some language-specific exceptions such as for Serbo-Croatian. --WikiTiki89 21:54, 5 September 2014 (UTC)
I also think this is a good idea. CodeCat, could you take a crack at this when you have a chance? I can't do it myself since Module:links is locked. Benwing (talk) 08:07, 6 September 2014 (UTC)

Bracketed ellipses and widowing[edit]

A week or so ago I fell sick and faced a time of lassitude and confinement. To counter or alleviate this I decided to undertake a detailed but repetitive Wiktionary project that had suggested itself during my normal occupation of adding quotations to senses, a project that could be done with a dull mind, and sitting up in bed.
 I had noticed that Robert Burton's The Anatomy of Melancholy was frequently and justifiably quoted in various entries, but also that the quotations varied in details and only occasionally (and sometimes wrongly) linked to the two directly relevant Wikipedia entries. So I created a template (Template:RQ:RBrtn AntmyMlncly) to use in sense headings. I've recently finished (I think), and there were over 400 quotations.
 As is my habit, I occasionally made what I saw as minor changes, such as adding full stops (periods) at the end of sense definitions, and making leading letters in those definitions upper case if needed. After I had been doing the project for a few days I became aware that one of the most frequent problems I was fixing was potential widowing. Because of a background in the printing industry going back to the early '50s and the days of moveable type, this was not a minor problem to me. I'll explain.
 A widow, as I was taught it, was several possible things. A line widow, as it was told to me, was the first or last line of a paragraph placed on a separate page from the rest of the paragraph. And this was considered unutterably evil if a word was split by hyphenation across the pages. Of course, in the days of moveable type this could very easily be fixed before the flong was flung. But there are widows at the letter level if mistakes are made in the typesetting, particularly with punctuation because punctuation usually needs to be juxtaposed to a word. These were avoided automatically by typesetters and simple ones are nowadays usually avoided by formnatting software. And line widows are irrelevant to the Wiktionary as running text is not split between separated pages; Keys such as PgUp and PgDn move text on the screen.
 However, the widowing that I became aware of was particular to quotations related to the bracketed ellipses (hereinafter simply called ellipses). It looks as if the {{...}} template was introduced to make ellipses easy to key in. This is just fine if the ellipsis is simply a gap between words, but there is a problem: if the ellipsis is at the end of a sentence or quotation then it needs to be followed by a full stop (period). Unfortunately for the full stop, {{...}} puts a simple space in the display on either side of the ellipsis.
 That it looks bad is not the major problem, which is that the full stop will be split off if it won't fit onto the end of the line, and worse still if the split happens at the end of a quotation. Ordinary users of the Wiktionary will have different line widths, so some would see it even if most wouldn't. Not quite so bad, but bad enough, is the effect of the simple space put in front of the ellipsis which can split the ellipsis from the text it follows, which has a bad effect on the aesthetics and a bit of burden for the readability.

  I have been criticised for making my modifications to avoid these problems and have had some undone. What I am asking for here is to have agreement that something should be done about the problems like the ones I have outlined. If agreement is reached, I suggest that what should be done is to create three new templates
(1) {{..,}} to leave off the trailing blank of the ellipsis,
(2) {{,..}} to replace the leading blank of the ellipsis with a non-breaking space ( ),
(3) {{,.,}} to combine the above two actions.
Note that it would be a mistake to use (1) to add a full stop to the ellipsis as the omission will also be needed for other punctuation, such as the comma.

I've tried to look for what's in the {{...}} template, but have got lost, perhaps because of my medical state, which continues. If something can be done very quickly I would be extremely grateful, as I would like to use another similar project to lighten my oncoming days. — ReidAA (talk) 10:06, 5 September 2014 (UTC)

(First of all, thanks for your work on finding the quotations.) Personally I don't at all like seeing the &.amp; &.hellip; &.nbsp; stuff floating around in an entry (I see you've even used it to format your comments here) and I think it hinders editing. Almost the only one I ever use is &.mdash; and that's because I get tired of having to open Character Map to find the literal symbol. Anyway: IMO, if we need to fix this "widow" problem at all (has anyone else been upset by it, or even noticed it?), then Wiktionary's markup is not the correct place to fix it. It sounds more like something that should be addressed by the CSS (Cascading Style Sheets) standard, surely...? Equinox 10:16, 5 September 2014 (UTC)
I must plead guilty to sometimes using &.amp; in place of an ampersand and always use &.#0133; in place of an ellipsis - on the assumption (maybe mistaken) that these characters might not show up correctly on some machines. Should I desist? — Saltmarshαπάντηση 10:54, 5 September 2014 (UTC)
I would greatly appreciate any advice on how to achieve the effect of a non-breaking space using CSS. Specifically, I would like to replace many instance of &.nbsp;- with something less intimidating to potential contributors. In the application I have in mind all instances of a dash in a piece of text (a line with no line-breaks other than those imposed by the width of the frame) would warrant such treatment. DCDuring TALK 13:13, 5 September 2014 (UTC)
This discussion seems to be straying from the requests that I started this discussion with. Let me start again with a different tack.
  Someone wrote the (presumably very simple) {{...}} template that seems to have had wide acceptance, probably both by those like me who are concerned with presentation details, and those who are put off by plain HTML code, at least such as uses ampersands.  Furthermore I have been told that the Wikimedia software has trouble handling the directly coded […] ([…]) so that I should use the template, and I have been doing so when no widowing is threatened.
  If there are others like me concerned with presentation effects, and if it would be a simple task for someone with the requisite knowledge to create the three new templates I suggest, what objection is there to creating them so that concerned contributing editors like myself could improve the presentation as we go along making other contributions?  What effect would there be on editors who are not worried about widows?  My work here suggests to me that there is a hell of a lot of such tidying to be done. — ReidAA (talk) 23:07, 5 September 2014 (UTC)
I am not sure that I did this right, but it seems to work. It substitutes a nonbreaking space for what turned out to be an ordinary space. Try it: {{nb...}}. Frankly, if it works, it would seem to be preferred to the original. I can't see why it would ever be right to risk having the [...] widowed. DCDuring TALK 23:26, 5 September 2014 (UTC)
I tried it by simply putting {{nb..}} into an entry I was working on, but it gave an error. Is there something special I need to code to get to your brave attempt? And, yes, I agree about it being better than the original. I can't offhand think of any case where the change would cause problems. — ReidAA (talk) 00:05, 6 September 2014 (UTC)
@ReidAA: It is {{nb...}}, not {{nb..}}. If you typed it in with three dots and got an error, leave it there and let me know the entry. DCDuring TALK 00:46, 6 September 2014 (UTC)
@DCDuring: Not needed; I took it out when it didn't work.  But, I would prefer the shorter name for elegance; or how about {{nb.}}? — ReidAA (talk) 01:04, 6 September 2014 (UTC)
@ReidAA: I'd like to understand in which situations it goes wrong. Maybe I could fix it. DCDuring TALK 01:12, 6 September 2014 (UTC)
This addressed your second requested item. But why not eliminate the trailing space for all cases? What harm could come of having the leading space always be a non-breaking space? DCDuring TALK 23:31, 5 September 2014 (UTC)
Unfortunately there is a lot of existing code that relies on the trailing space being added. Oh, and I agree about the leading space. — ReidAA (talk) 00:05, 6 September 2014 (UTC)
@ReidAA: I removed the trailing space for {{nb...}}. The impression I get is that most uses of {{...}} have additional leading and trailing space inserted by users after the template. The wiki software renders the two spaces, one from the user the other from the template as one. I think the problematic entries that would exist if we were to substitute {{nb...}} for {{...}} would be of two kinds. 1., The kind with no space rendered after the template could be found by search for  [] followed by a letter. This wouldn't be so bad for the basic Latin script, but the search would also have to take place for other letters and symbols in other scripts. 2., The kind with an extra space before would need some kind of bot to eliminate the extra breaking space, but the extra space is not very annoying to me. Can you find the extra space on this paragraph? DCDuring TALK 00:24, 6 September 2014 (UTC)
@DCDuring: I put two responses in last time; I think you might have overlooked the first.
  Indeed a lot of editors include both leading and trailing spaces with their ellipsis template invocation, but there are also a helluva lot who don't.  Hullo, hullo, I've just noticed that your template has three dots; in my trial I only used two.  When I've finished here I'll go and try again.  Wouldn't the extra preceding breaking space your refer to potentially widow the ellipsis?  Or have I misunderstood you?  Incidentally, I've just imitated your ping at the start of this response.  Is this supplementary or alternative to the watch this page button? — ReidAA (talk) 00:57, 6 September 2014 (UTC)
@DCDuring: Well, it works, thanks very much indeed.  When will it be alright for me to start using it?  Do you agree with me about the shorter name(s) though? — ReidAA (talk) 01:14, 6 September 2014 (UTC)
{{ping}} is additional. It lights up the number next to your username at the top of the page. I don't remember if it works across projects.
If we replace insert the altered code into {{...}}, there would be a lot of cleanup to do.
I'd like to keep this five-keystroke name until I know we can't improve it. Your suggestions seem fine, but this might warrant more attention, perhaps at the grease pit.
{{nb...}} should be used without a user-added leading space to avoid the line-break. The user controls what happens after the {{nb...}}. It is by no means idiot-proof.
You could start using it now. It is really fairly simple, probably low risk. If something goes wrong, — stop using it, undo it, and let me know the entry where it went wrong. DCDuring TALK 01:26, 6 September 2014 (UTC)
Then I shall start using it right away.  My tests suggest to me that, since I understand it well (I think), there will be no problems.  My very small experience with RQ templates included the (somewhat obscure) creation of documentation to go with them.  Your code for nb... looks very opaque to me, but would you like me to try to create documentation for its use?  Again, thanks very much for your cooperation and work. — ReidAA
It is not really my code. All I did, starting from a copy of {{...}}, was substitute a nonbreaking space for the ordinary space at the leading position and delete the trailing space. I really couldn't have started from scratch. DCDuring TALK 04:31, 6 September 2014 (UTC)
Well done. I'll put some documentation together in a little while and let you know. — ReidAA (talk) 04:36, 6 September 2014 (UTC)
@DCDuring: Template documentation added at Template:nb.../documentation. You might like to look it over and improve it. — ReidAA (talk) 07:00, 6 September 2014 (UTC)
@ReidAA: I added Category:Text format templates to both {{nb...}} and {{...}} and referenced {{nb...}} in the documentation for {{...}}. There might be some now-unnecessary CSS stuff in {{nb...}}. DCDuring TALK 13:02, 6 September 2014 (UTC)
@DCDuring: Thanks very much.  Great work!  And I've found it useful several times already. — ReidAA (talk) 23:20, 6 September 2014 (UTC)

Proto-Hellenic and Proto-Greek[edit]

Although we tend to think of Greek as a single language, it's actually a family of languages, and those languages have a common ancestor. According to Wikipedia, what is reconstructed as "Proto-Greek" usually includes all Hellenic dialects, including Mycenaean, but not the rather divergent Ancient Macedonian. One noticeable point of early divergence in AM is that the aspirates have voiced reflexes in some cases in AM while they are always devoiced in the rest of Hellenic. Unless Mycenaean underwent a re-voicing, this would have to indicate that their common ancestor still had voiced aspirates as they were inherited from Proto-Indo-European.

So this would mean that a hypothetical Proto-Hellenic, including Macedonian, would have to have voiced aspirates, while Proto-Hellenic-minus-Macedonian would have voiceless aspirates. The problem is how to define the Hellenic language family, and how people reconstruct it. Wikipedia notes that people may or may not include Macedonian in the reconstruction, but generally do not. Of course, if we include such reconstructions, we can't really call them "Proto-Hellenic" because they're missing one branch. So what should we call them? If we call them "Proto-Greek", then we would need to invent a new language family for the non-Macedonian branch of the Hellenic languages, make up a name for it (if we use "Greek" it would cause confusion with the language) and also a code. —CodeCat 23:28, 6 September 2014 (UTC)

Is there really that much consensus that Macedonian shares a common ancestor with Greek later than PIE? I would say we should treat Proto-Hellenic as identical to Proto-Greek and leave Macedonian out of it altogether to be on the safe side. —Aɴɢʀ (talk) 13:01, 7 September 2014 (UTC)
I think there is a fair consensus, at least judging by Wikipedia sourcing. But after having looked at it, there are some common innovations that appear to show that if it wasn't Greek, it was closely allied with it. At the very least, it was still similar enough to Greek that it took part in common sound changes, such as the loss of -y- intervocalically and the change of final -m to -n. w:Ancient Macedonian language shows various classifications that have been made over time. To me, the first and last proposals are the most plausible, and note that some sources call the combination of Greek + Macedonian "Greco-Macedonian" while others call it "Hellenic". So we have to make sure we check how each source understands the terms "Greek" and "Hellenic" before using them as references. In any case, if we do treat Proto-Hellenic as not including AM, then we also need to change the family of AM because it's currently included in Category:Hellenic languages. —CodeCat 13:12, 7 September 2014 (UTC)
I've written a basic draft for Wiktionary:About Proto-Hellenic. Can other editors review it and comment on it? —CodeCat 12:30, 8 September 2014 (UTC)
Oh look - even more legitimate etymologies vandalized by CodeCat's original research. Will this nightmare ever end? --Ivan Štambuk (talk) 18:08, 8 September 2014 (UTC)
Oh look, even more personal attacks from Ivan. Will this nightmare ever end? —CodeCat 19:59, 8 September 2014 (UTC)
CodeCat, what are your sources for reconstructions like *éhər. Also, how is it useful? All its content can be presented at ἔαρ (éar) without duplication. --Vahag (talk) 20:18, 8 September 2014 (UTC)
I agree with Angr that we should not distinguish Proto-Hellenic from Proto-Greek. Wikipedia does not seem to support such a distinction, specifically noting in w:Hellenic languages that most researchers consider them identical. This could happen either if Ancient Macedonian is considered to be a descendant of Proto-Greek or to be outside Proto-Hellenic entirely. The Wikipedia page on w:Ancient Macedonian language describes it as a dialect of Northwest Greek (i.e. a Proto-Greek descendant) but also notes that it's not well-attested. The supposed need for a distinct Proto-Hellenic hangs from a pretty thin thread, IMO -- one single sound change (which Wikipedia identifies as happening "sometimes") in a barely-attested language along with a clear lack of scholarly consensus. As a result I really think that it needlessly complicates the picture to create a Proto-Hellenic separate from Proto-Greek, which is going to have identical reconstructions to Proto-Greek except for mechanically substituting voiceless aspirates with something else. If you want to include AM forms, you should probably just list them under Proto-Greek.
As for the sound changes themselves, there's nothing a priori impossible about a revoicing of aspirates to voiced stops, i.e. there's not really any strong evidence that AM wasn't a Greek dialect. And your examples of common sound changes between AM and Greek don't rule out AM being outside of Proto-Hellenic: The loss of intervocalic -y- and change of -m to -n could easily be areal phenomena, or simply independent changes, esp. -m to -n, which occurred in many different IE branches. Benwing (talk) 21:32, 8 September 2014 (UTC)
The w:Proto-Greek page lists several sound changes that must have occurred after Grassmann's law, and the devoicing of aspirates is one of them. AM didn't appear to have aspirates (or at least they weren't written for us to see?), so it seems that this important identifying feature of Greek did not affect AM. It also implies that any pre-stage of Proto-Greek that would include voiced aspirates would also lack Grassmann's law, as well as palatalization, so it would be much closer phonologically to PIE; basically PIE minus laryngeals. —CodeCat 21:44, 8 September 2014 (UTC)
The devoicing of aspirates occurred before Grassmann's Law. And the list you present isn't necessarily chronological (BTW I wrote most of that list). But I don't see what any of this has to do with whether there should be separate Proto-Hellenic reconstructions, which I don't think makes sense. And as I said before there's no evidence that AM's voiced sounds weren't secondary or due to substrate influence or whatever. Benwing (talk) 02:27, 9 September 2014 (UTC)

Italic and transliterations[edit]

Hi,

I noticed that in the section of translations and for the headline, transliterations are not in italic, while they are in the section of etymology. It's not very important but I'm curious: is there a reason to that? Thanks by advance. — Automatik (talk) 02:00, 7 September 2014 (UTC)

That's odd- I see the opposite (удар (udar) (no italic), удар (udar) (italic)). DTLHS (talk) 02:14, 7 September 2014 (UTC)
Sorry, my mistake. I fixed it. — Automatik (talk) 04:08, 7 September 2014 (UTC)

Layout of IPA in the editing tools[edit]

Currently there doesn't seem to be a very clear method to how the IPA characters are arranged in the edit tools (below the edit window). This makes many of the characters hard to find as you basically have to look through the whole list, sometimes several times. So I'd like to propose that the characters be arranged in the standard IPA table format, like on WT:IPA but perhaps more compact. —CodeCat 13:45, 7 September 2014 (UTC)

Certainly more compact; there's no need to include the IPA characters that are also basic ASCII characters. (I believe that set consists entirely of the 26 lowercase letters of the English alphabet.) —Aɴɢʀ (talk) 15:03, 7 September 2014 (UTC)
Then there would just be gaps in the table. We might as well include them if the space goes unused otherwise. —CodeCat 15:37, 7 September 2014 (UTC)
The most helpful layout for a non-linguist editor would be per language: letters in alphabetical order, the IPA symbol immediately after each letter. E.g. for Hungarian: Aa /ɒ/ Áá /aː/ Bb /b/ Cc /t͡s/ Cscs /t͡ʃ/, etc. Since there are more than 1500 languages in this wiki, this might be a very long list in the drop-down, but this could be solved if editors could select a small number of alphabets to appear in the drop-down; the ones they are actually using, let's say between 1 and 15. This could be added to the preferences. --Panda10 (talk) 16:35, 7 September 2014 (UTC)
Surely people who are editing in a language are at least able to name the script it uses? —CodeCat 17:09, 7 September 2014 (UTC)

Automatic pronunciation templates[edit]

We now have a fair number of templates that automatically generate pronunciation information in IPA based on the spelling of the headword and/or an orthographic representation given as a template parameter. The ones I'm aware of are:

There may be others; not all of the above are included in Category:Pronunciation templates, so maybe there are more that aren't there. (There's also {{fa-pron}}, which doesn't actually give pronunciation information, and {{liv-IPA}} which requires manual input of IPA rather than generating it automatically.) If you know of others that I haven't listed above, please add them.

The first problem is that these templates are not all gathered into a single category. Is Category:Pronunciation templates sufficient, or should there be a more specific category like Automatically generated pronunciation templates for them?

The second problem is that there is no uniformity of naming. Some are called "xy-pronunc", some are called "xy-IPA" (or "xy-ipa"), some are called "xy-pron" (particularly bad since other templates called "xy-pron" are headword line templates for pronouns), and some have "-auto" appended to the end. Ideally, there should be a uniform name for these; my preference is for "xy-pronunc", but what do others think? —Aɴɢʀ (talk) 14:55, 7 September 2014 (UTC)

I think we should name them {{xx-IPA}}. This resolves the pronoun problem and leaves room for the theoretical possibility of counterparts outputting something other than IPA. --WikiTiki89 13:44, 8 September 2014 (UTC)

Some IPA templates names differ in usage, e.g. Korean {{ko-pron}} is to display the user written IPA and {{ko-pron/auto}} is automatic. (There are also a number of templates for Chinese (Mandarin, Cantonese, Min Nan, etc.) and a Japanese template, which weren't listed above.) --Anatoli T. (обсудить/вклад) 05:54, 9 September 2014 (UTC)

Also {{fa-pronunciation}} (takes transliteration as input; not Lua-ized) --Z 08:54, 9 September 2014 (UTC)
I have made the following moves so at least the Luacized templates are consistently named:
Two languages that automatic IPA templates ought to be easy to write modules for (but not by me since I don't know how to write modules) are Finnish and Hungarian. Anyone feel like writing modules and creating {{fi-IPA}} and {{hu-IPA}} to invoke them? —Aɴɢʀ (talk) 15:01, 1 October 2014 (UTC)
I've considered writing a module for Slovene, but for that language we would also want to display the tonal diacritic/respelled form. Naming it "IPA" would not be appropriate in that case. —CodeCat 15:32, 1 October 2014 (UTC)
Yeah; as Wikitiki says, using "-IPA" for the ones that do generate IPA allows the option of having some other name for the ones that don't (or the ones that generate other things in addition to IPA). —Aɴɢʀ (talk) 15:56, 1 October 2014 (UTC)

Workshop: Greek and Latin in an Age of Open Data[edit]

This event, at the University of Leipzig in December, may be of interest: Workshop: Greek and Latin in an Age of Open Data. It would be good to have somebody there to represent and speak about Wiktionary and Wikidata. Pigsonthewing (talk) 16:00, 7 September 2014 (UTC)

ORCID and other identifiers[edit]

People who edit Wiktionary should be able to show w:ORCID (and other forms of w:Authority Control, such as w:VIAF) on their user pages, as explained at w:WP:ORCID.

The template d:Template:Authority Control allows this. Can someone with the relevant bit please use Special:Import to import it and all its sub-templates (or set the bit, temporarily, to allow me to do so)? I'll then be happy to do the documentation and set up some examples. Pigsonthewing (talk) 16:15, 7 September 2014 (UTC)

@Pigsonthewing: Assuming you meant w:Template:Authority control (d:Template:Authority Control doesn't seem to exist), I imported it to Template:authority control. However, I didn't want to import the complex modules it depended on — like (apparently) w:Module:Arguments, which is probably written according to different coding conventions than are used here, and possibly also redundant to things here — and the template was being waaay more complex than it needed to be by depending on modules, anyway, so I simplified it dramatically. If someone wants to prettify it a bit so that parameters which are not set simply don't display, rather than displaying "—", they can feel free to do that. - -sche (discuss) 02:34, 10 September 2014 (UTC)
Thank you, - -sche. I actually meant d:Template:Authority control (lower-case "c"). Your version has removed both the links to articles about the authority control type (like w:VIAF, w:ORCID, but also to the authority control databases, (like https://viaf.org/viaf/70042340 and http://orcid.org/0000-0001-5882-6823 ). For a comparison, look at my user page here, and on en.WP. Please will you look to importing the Wikidata template instead, which has the latter links, and a version of the former (which I will then update), and hides empty parameters? Pigsonthewing (talk) 21:10, 14 September 2014 (UTC)
@Pigsonthewing: Ah, I see. Well, I didn't notice this before, but it seems that en.Wiktionary cannot import pages from Wikidata. (Compare how en.WP cannot import pages from en.Wiktionary.) The wikis which our Special:Import is configured to allow importation from are w, b, s, q, v, n, commons, and a long list of other-language Wiktionaries. I can think of a couple of ways around this. One is to get approval for en.Wikt to import Wikidata pages; I suppose the avenue for that would be bugzilla. Another is to copy whatever revision of the Wikidata template you want, using your edit summary to explain what you were doing and to direct users to see the Wikidata page for its history and contributors. - -sche (discuss) 22:54, 14 September 2014 (UTC)
@-sche: Thanks; in that case, I'll do the latter (I was just hoping to preserve the edit histories). Before I do, would you like to delete your imports, or shall I just overwrite them? Pigsonthewing (talk) 18:05, 15 September 2014 (UTC)
You can just overwrite them. - -sche (discuss) 18:17, 15 September 2014 (UTC)
@-sche: OK, that's done; see Template:authority control. I'm out of time now, but tomorrow I will work on the documentation, examples and styling. In the meantime, please see the instance on my user page. Pigsonthewing (talk) 21:03, 15 September 2014 (UTC)

Update[edit]

The template is now working. It can be styled horizontally (like on Wikipedia - see my user page on en.WP) or left as it is. To do the former, we'd need to copy the rules for the class hlist from en.WP's Common.css

Next, we need to think about how to encourage contributors to display their authority control IDs on their user pages; and if appropriate to register for an ORCID identifier. How can I get a line added to Wiktionary:News for editors? Pigsonthewing (talk) 15:11, 16 September 2014 (UTC)

I've solicited others' input on whether to copy en.WP's hlist rules or not.
I can add a blurb to WT:NFE. What should it say? "Template:authority control exists and allows users to specify their authority control numbers." ?
- -sche (discuss) 20:10, 16 September 2014 (UTC)

Requests for verification of pronunciation[edit]

Occasionally I come across entries where I suspect the pronunciation listed is erroneous. Where can I request verification of suspect pronunciations? We have Wiktionary:Requests for verification but that seems to be for verifying meanings only, not for verifying pronunciations. —Psychonaut (talk) 14:18, 9 September 2014 (UTC)

All I can think of is to use the {{attention}} tag, or take a specific issue to the Tea room. —Aɴɢʀ (talk) 14:48, 9 September 2014 (UTC)
We have {{rfv-pronunciation}}. —CodeCat 16:58, 9 September 2014 (UTC)
Which categorizes entries into "Category:Xyz entries needing reference", but AFAICT all such categories are red. Category:English entries needing reference is at any rate, and adding {{poscatboiler|en|entries needing reference}} creates an error. —Aɴɢʀ (talk) 13:13, 13 September 2014 (UTC)
They haven't been created yet because there's a discussion at WT:RFM about what to name them. Please join in! —CodeCat 13:18, 13 September 2014 (UTC)
The trouble with joining in that conversation is that I don't care what they're named as long as a name gets picked soon. —Aɴɢʀ (talk) 14:33, 13 September 2014 (UTC)
I think I'll just create the categories for now, and delete them once a conclusion is reached. That way things aren't left hanging in the air while people decide. —CodeCat 19:25, 15 September 2014 (UTC)

Change in renaming process[edit]

-- User:Keegan (WMF) (talk) 16:22, 9 September 2014 (UTC)

Sant Bhasha[edit]

Apparently, a user that goes by the username Bhvintri (talkcontribs) says/claims that the word ਸੋਚੈ is of a language called Sant Bhasha. Should we make a code and category for such a language? --Lo Ximiendo (talk) 19:31, 10 September 2014 (UTC)

Can we wait for someone to comment before deleting the entries? DTLHS (talk) 00:56, 11 September 2014 (UTC)
I wasn't aware of this discussion, and the user kept creating more of them. I deleted them to try to limit the damage as every single one of them was badly formatted, and in any case we'd need to update them all if a code is assigned. —CodeCat 01:08, 11 September 2014 (UTC)
The Wikipedia article (w:Sant Bhasha) seems a bit confused about what this really is: is it a lingua franca, a conlang, or a grab bag of various similar lects used to communicate between members of a particular stratum of society who otherwise don't speak the same languages? Chuck Entz (talk) 02:51, 11 September 2014 (UTC)
Literally, it means "Saint language". I had not heard of it before, but I believe it is like the Slavic esperanto that any group of people who come from each of the Slavic-speaking countries (Russian, Poland, Ukraine, Serbia, Czech Republic, Slovakia, Belorusia, Bulgaria) naturally fall into in order that everyone can speak to and understand everyone else. In the time of Germany’s w:Martin Luther (early 1500s), the German theologians, philosophers, and other writers from various parts of Germany, all speaking different dialects of German, did much the same thing, which is where today’s Standard High German comes from. —Stephen (Talk) 03:18, 11 September 2014 (UTC)
I think calling it an "Esperanto-like language" is misleading. It doesn't seem to have been deliberately constructed by someone. I think it's much more like a lingua franca. The Wikipedia article on Sant Bhasha is very confusing, but the article on the Guru Granth Sahib says more clearly, "It is written in the Gurmukhī script, in various dialects – including Lehndi Punjabi, Braj Bhasha, Khariboli, Sanskrit and Persian – often coalesced under the generic title of Sant Bhasha." That leads me to believe that Sant Bhasha doesn't require a code of its own, because all of the words appear in some other language, i.e. ਸੋਚੈ is a word of Lehndi Punjabi, Braj Bhasha, Khariboli, Sanskrit and/or Persian, although perhaps only written in Gurmukhi when it's being used in Sant Bhasha. (Is the phonetic similarity between sant and saint/santo a coincidence?) —Aɴɢʀ (talk) 07:01, 11 September 2014 (UTC)
Hi There! I will stop making any new entry unless this matter is resolved.
Now why we need separate entry for Sant Bhasha. Here are my points in favour of it:
1.) There is a bulk of literature written in it by number of writers. I am listing few of them.
a. Guru Granth Sahib - Note it is a combined work of more than 30 authors.
b. Dasam Granth
c. Varan Bhai Gurdas
d. Panth Parkash
e. Suraj Parkash
2.) The languages from which it draws its vocabulary: Punjabi, Hindi, Marathi, Sindhi, Apabhramshas, Sanskrit or Persian, none of them use Gurmukhi Alphabet except Punjabi. Here it is not logical to put Sanskrit, Persian or Hindi language words in Gurmukhi Alphabet. The same logic why we need to put the word 'algebra' under English in Latin Alphabet, when we already have an entry for this word under Arabic in Arabic Alphabet. Why we need the words borrowed from Latin under different languages, when they are already explained under Latin ?
3.) Most important point is that we a bulk of word-forms where the root word comes from different language and its declension or conjugation is derived from different language. eg in ਸੋਚੈ 'sochai', root-word 'soch' could have been borrowed from Punjabi, whereas declension -ai is derived from Sanskrit instrumental plural -aih. So where should we put this word under Punjabi or Sanskrit?
Q-Is the phonetic similarity between sant and saint/santo a coincidence?
A-Maybe Latin sanctus and Sanskrit santa originated from same Indo-European root.Bhvintri (talk) 00:32, 12 September 2014 (UTC)
  • Are there any grammars or dictionaries of it, or it's just a literary language of a fixed number of works? I think that it should be mandatory to have citations for any added words for obscure cases such as this one, so that it's easier to clean them up in the future (e.g. if it is decided to treat it as a form of some other Middle/Modern Indo-Aryan language). --Ivan Štambuk (talk) 10:02, 13 September 2014 (UTC)
This is what wikipedia says under Sacred language:
Sant Bhasha, a mélange of archaic Punjabi and several other languages, is the language of the Sikh holy scripture Guru Granth Sahib.

http://books.google.com/books?id=Itp2twGR6tsC Indo-Aryan Languages by Colin Masica also attests it on page 57 as:
The Sant or Nirguna tradition of mystical poets, beginning with Kabir, prefered a fluid mixed dialect with a strong Khari Boli element.

Grammars:
'An Introduction to the Sacred Language of the Sikhs' by Christopher Shackle is a book on the grammar of this language. But unfortunately I can't find it online. http://www.amazon.com/An-introduction-sacred-language-Sikhs/dp/B0007BRI5W
In Indo-Aryan Languages (edited by Geroge Cordona and Dhanesh Jain) http://books.google.com/books?id=OtCPAgAAQBAJ on page xxi it is listed as 'Language of Adi Granth' (note: Adi Granth is another name for Sikh holy scripture Guru Granth Sahib), separate from Punjabi or Hindi. Further from page 656 to 672, Christopher Shackle has given a Declensions and Conjugations of its nouns, pronouns, adjectives and verbs and compared them with Modern Punjabi.
Dictionaries:
The most famous dictionary of this language is Mahan Kosh published in 1930 and then republished in 1981.
Mahan Kosh and other online dictionaries of this langiage can be found online at many links like this: http://www.srigranth.org/servlet/gurbani.dictionary
Bhvintri (talk) 19:39, 13 September 2014 (UTC)

Pronunciation needs to be at level 4 for Arabic[edit]

WT:ELE claims that pronunciation ought to be at level 3, above individual entries for nouns, verbs, adjectives, etc. Apparently, cases like duplicate where the pronunciation differs by part of speech are handled by listing the part of speech under the pronunciation section, above the corresponding pronunciation. But this fails entirely for Arabic, where a single page may have e.g. two nouns and three verbs on it, each with a different pronunciation (because the vowels are omitted in writing). In fact, it's rarely the case that two different part-of-speech entries will share the same pronunciation. As a result it seems clear to me that pronunciation for Arabic needs to go at level 4. But where exactly? I think the most obvious thing is to place it directly above the definition, possibly without any preceding header.

Note also that the current naming scheme for the .ogg pronunciation snippets is totally broken because it's named for the page title (without vowels), meaning that there's no way with this naming scheme to have separate pronunciations for two different subentries on the page, much less five or six. Benwing (talk) 06:28, 12 September 2014 (UTC)

This is a side-effect of the problem of etymologies in Arabic (both diachronic i.e. from Proto-Semitic, and synchronic i.e. from root x-y-z) - they usually refer to a one specific PoS using one specific derivational mechanism, but are instead usually grouped as if referring to all of them. After splitting by individual etymologies, pronunciations should come at level 4 naturally. These are left at level 3 for now until someone knowledgeable comes along. --Ivan Štambuk (talk) 15:22, 13 September 2014 (UTC)

Cleaning company spam[edit]

We're getting quite a lot of this lately (one every day or so?). It advertises various cleaning companies in the UK, in e.g. Brent and Enfield. I added a filter thing to prevent the original spam message, which can also be found on other sites with the same wording, but they seem to have changed to another one. Further filtering would be welcome, as this spammer is creating a lot of accounts — one useful thing might be that the word "clean" appears in many of the user names. Equinox 15:05, 12 September 2014 (UTC)

Chinese Character Composition[edit]

During my studies of Chinese I thought it would be useful to be able to look up characters by their components, not limitted to the traditional radicals. I saw that wiktionary already has some of this information and I used it as a starting point.

Altogether I decomposed more than 14,000 characters, traditional and simplified. My decomposition also provides locational information of the components in the characters.

See http://bioinfoc.ch:8081/languages/HanziComp for an application that uses the composition information. The Help link gives a short introduction.

The format of what I could provide is like this:

児 t:131/2s,b:r10 er2

兑 =11/3 =(t:r12a,b:11/2) s7 dui4

兒 =r10a =131/14 =(t:r134,b:r10) s8 er2 r5

兔 =164/1 =(a) s8 tu4

兕 t:69/118,b:r10 si4

兖 t:29/57,b:11/27 yan3

兗 t:29/57,b:11/2 yan3

兘 o:11/13,i:58/6 shi3

兙 o:152/1',i:r24 shi2 ke4

党 =63/32s =(t:200/105,b:11/2) s10 dang3

兛 o:152/1',i:10/3 qian1 ke4

The three fields are: Character Composition Pinyin

The composition field can also name components that are used in other characters: =rNNN traditional radical =NNN/NNN name used in Chinese Characters: A Genealogy and Dictionary (English and Mandarin Chinese Edition) [Paperback], Rick Harbaugh (Author) (http://www.zhongwen.com/)

Some of the named components are atomic =(a) Others are further decomposed, e.g. =(t:r134,b:r10) All of the named components have the number of strokes sNNN

I'm sure that my analysis still contains errors (but I checked it in multiple ways for consistency), some questionable assignments or incomplete decompositions.

Anyway, is there interest from your side to integrate this information into wiktionary in order to make it available to a broader audience? I would be able to put more work into this and give you the information in any format. —This unsigned comment was added by Brogerc (talkcontribs).

My Chinese knowledge is extremely limited, but no one else has replied and this information seems valuable. My main question would be how it could be presented to the user so that it could be useful and easily understood? And how does it differ from the composition data which is already present on many entries in Wiktionary. E.g. lists the composition as: ⿱. I assume this is similar to what "t:131/2s,b:r10" represents? Thanks. Pengo (talk) 22:43, 23 September 2014 (UTC)
From what little I know about Chinese characters, the components aren't necessarily just simple combinations. Often a character that is made of smaller components can be used as a component itself in an even larger character. So it's much like compounding, where a compound can be used as a base to form a larger compound. So the question is how we want to show this information. Do we want to show only the most basic components, or do we also want to show the intermediate combinations? —CodeCat 22:53, 23 September 2014 (UTC)
All I know is that this notation looks confusing and darn-near unreadable. I don't know how helpful it would be to add such an obtuse notation for character decomposition, especially given its dependence on using Latin letters. There's never going to be an "easy" method of breaking down these characters; many characters aren't based on regular radical forms but rather on however the Chinese could best modernize them from earlier Oracle Bone and Small/Large Seal Script variants. Bumm13 (talk) 22:29, 24 September 2014 (UTC)
The information I can contribute is exactly as already available for some characters such as the above mentioned , but I created it for >14,000 characters and, in addition, I can provide positional information (top, bottom, left, right, in, out) that is currently not available in Wiktionary.
Most of my decompositions are binary: left/right, top/bottom or in/out. Both of the components can possibly be further decomposed, but this can be looked up in the entries of both components (characters), as it is currently done in Wiktionary.
My (internal) IDs don't have to be shown, just the corresponding components (characters).
—This unsigned comment was added by Brogerc (talkcontribs).

Birds in English[edit]

Hello,

In the Finnish version we have many articles on birds. However, we have yet to decide a naming policy, and I came here to ask your thoughts. I've noticed, that here birds names are in small caps, for example coal tit. In the Finnish version we have both fi:coal tit and fi:Coal Tit and this is somewhat problematic. In ornithology, caps seem to be used: http://www.worldbirdnames.org/english-names/spelling-rules/capitalization/. So, how should we write English bird names in the Finnish version? Of course, we would ideally want the interwiki links to work, so also a common naming policy could be sought after. Suggestions? --Hartz (talk) 16:18, 13 September 2014 (UTC)

It’s up to you guys, but my suggestion is to include all attested spellings. — Ungoliant (falai) 16:49, 13 September 2014 (UTC)
(edit conflict) Naming policies are for Wikipedia (where there's a good bit of debate on the subject). Wiktionary goes by usage: in theory, the spelling/capitalization that's used most should be the main entry, and any others attestably in use should be alternative forms. In practice, though, the one that's created first tends to be the main article, and I'm not sure if the "most used" criterion has ever been explicitly made a policy. Information about which capitalization is used in which contexts would be good information for usage notes. If we were consistent enough with our context labels, we might indicate it that way, but people tend to use "zoology" or "ornithology" for any word having to do with animals or birds rather than just for words used by ornithologists or zoologists. Chuck Entz (talk) 16:51, 13 September 2014 (UTC)
  • I find there is a conflict between building a good set of substantive entries of such kind and having entries that are true to the most common orthography, which may differ by the source of whatever list or source the contributor was working from, contributor personal preferences, or even actual frequency research. Perhaps frequency of use should govern in principle, but it is a counsel of perfection that may stand in the way of good substance. Almost whatever reasonable two-part spelling a user types in will cause the search engine will find any two-part spellings in the wiki (including hyphenated forms, eg coal-tit, Coal-tit) and place them at the top of the headword-not-found page. Even a single-word spelling would be found, eg coaltit. But for regular users and contributors consistency of orthography makes it easier to know whether there is a substantive entry for a given vernacular name. The tedium of determining which is the more common form seems to me to far outweigh the benefits-in-principle, which seem quite modest relative to the benefits in practice IMO.
A possible practical solution would be to have standardized spelling for all main entries, but indicate the most common orthography among the alternative forms/spellings at the top of the entry. DCDuring TALK 20:20, 13 September 2014 (UTC)
  • The English Wikipedia engaged in a lot of bickering for a long time regarding how to capitalized birds' names, e.g. "rusty blackbird" / "Rusty Blackbird". Finally, recently, a broad (site-wide?) RFC — which NB was judged by an admittedly pro-uppercase editor — determined that birds' names should be lowercase, because they are lowercase in most cases (e.g. in general books about subjects like home decor which happen to mention that a tapestry depicts a rusty blackbird; in works of fiction; in general reference works; etc — in other words, in general use) and it is only in some specialist works on the subject of ornithology that bird names are capitalized, and in those cases, the capitalization is equivalent or akin to honorific capitalization or to the old English practice of capitalizing Important Words. As far as I know, Wiktionary has long used lowercase for that same reason. (Wiktionary and Wikipedia have both always used lowercase for animals other than birds, e.g. rusty tinamou.) Compare how several armed forces uppercase their rank terms and other terms, e.g. "Private", "Sailor", "Ship", etc, but we have just "private", "sailor", "ship", etc. Whether you want to include redirects from other case-forms is up to you. Wiktionary does not use redirects for things like "Ship"; I don't know of any examples of Wiktionary using redirects for birds' names, but I can't rule out that some exist, and I express no opinion at this time on whether or not such redirects should exist. - -sche (discuss) 20:10, 13 September 2014 (UTC)

Official wordlists for specific fields[edit]

There are prescriptive lists of common names for some taxonomic groups issued by various scientific and other organizations, e.g. the w:International Ornithological Congress has a list at www.worldbirdnames.org. These obviously aren't a factor as far as CFI, but it might be worthwhile to have categories and/or reference templates to indicate that a name is designated as the preferred name in a given list. I don't think appendices are that good of an idea, since they would duplicate lists available elsewhere online. The IOC list would fit nicely in our current topical framework, since it covers multiple languages, but I believe there are some that only cover a given language in a given region.

I would appreciate any suggestions on how to represent this information, since these lists would be a good way to expand our coverage of names for living things, and some are available online in formats that could be used for mass importation of entries if anyone who knows how is so inclined. Chuck Entz (talk) 17:43, 13 September 2014 (UTC)

A good representation is to create an entry for each bird. I'm sure many already exist. This Excel spreadsheet has a lot of good information and this would be a lot of work. A bot could create the missing English entries, add the given translations, and create the FL entries. The English bird names are all capitalized in the spreadsheet, though. --Panda10 (talk) 18:46, 13 September 2014 (UTC)
We have more than 5200 Translingual bird name entries that use {{R:Gill2006}}, thereby indicating recourse to the IOC publication. These usually contain little more than the English vernacular name, not even that for many genus names. Remarkably that template does not appear in many (any?) English vernacular name entries.
Mass importing might be nice once we agree on a desirable format, which need not be very difficult. I have been working on a more demand-oriented approach for taxonomic names and corresponding vernacular names, but mass-import is good. Spreadsheets are easy to work with for reformatting, so capitalization need not be a problem if we agree on a simple policy of importing in a standard capitalization, leaving the more time-consuming business of determining more frequent capitalization, by date, usage context or whatever, for future generations of contributors with even more powerful tools and resources.
Isn't there also an international bird-watchers body that has different naming ideas? DCDuring TALK 21:39, 13 September 2014 (UTC)
  • @Chuck Entz: There is also a similar list for viruses (~2-3K names), published by the International Committee for Taxonomy of Viruses. Checklists from the USDA Plants database are downloadable for each US state, certain territories and possessions, for Canadian provinces, and for the whole. It should be possible to get their USDA official "vernacular" name. The large NCBI (US) taxonomy database seems downloadable and has some vernacular names. There are certainly others for taxonomic families or other groupings. Some may be POV, in the sense of advocating a taxonomic scheme, not necessarily widely accepted. Some more definitive higher-level groupings seem to be restricted to further the sales of print publications or of machine-readable data (eg, mammals). There seem to be many more databases of scientific names than of vernacular names, even of the "recommended" vernacular names. Vernacular names deserve some priority, especially as long as contributors who vote on this page disfavor for some reason I can't fathom translation tables for taxonomic names.
All of that said, Wikispecies has many entries with tables of vernacular names. Simply adding the taxonomic names for such entries, followed by stub L2 sections for the vernacular names we do not have would be a significant contribution, possibly more to the taste of contributors here. We might be able to do them the favor of identifying possible or even actual errors using gender agreement. I have confirmed some using more definitive online database such as IPNI. DCDuring TALK 18:18, 23 September 2014 (UTC)

Renaming rhyme pages[edit]

I have noticed rhyme pages have been renamed, such as from Rhymes:Czech:-alɪ to Rhymes:Czech/alɪ. I object and ask that they be renamed back. I cannot find the Beer parlour discussion for this. --Dan Polansky (talk) 07:25, 14 September 2014 (UTC)

Note that I am the creator of more than 1000 comprehensive Czech rhyme pages.

I object to subcategorization; I ask that all Czech rhyme pages be found in Category:Czech_rhymes, as they were before not too long. --Dan Polansky (talk) 07:28, 14 September 2014 (UTC)

Wiktionary:Requests for moves, mergers and splits[edit]

I think Wiktionary:Requests for moves, mergers and splits does more harm than good. Proposals are being made there that affects more than 1000 of pages. Such proposals should IMHO be made in Beer parlour. For one-off moves of single mainspace pages, WT:Tea room should suffice. I think whenever someone makes a proposal there affecting a volume of pages, the proposal should immediately be rejected as being made via a wrong venue. In the ideal hypothetical world, the page would probably be deleted. --Dan Polansky (talk) 08:56, 14 September 2014 (UTC)

Maybe we should merge all the forums into your talk page so you won't miss anything, because, obviously, there's nothing more important than keeping you informed. Never mind that it's been the designated forum for this kind of thing for years- you missed out on something because you weren't paying attention to it, so it has to go. NOW!!! Chuck Entz (talk) 15:01, 14 September 2014 (UTC)
Wrong. I found the forum annoying when it was created back in 2010. It is now doing tangible harm. --Dan Polansky (talk) 18:28, 14 September 2014 (UTC)

Rhyme pages and subcategories or subcategorization[edit]

Dutch rhyme pages have subcategories. You can browse them from Category:Dutch rhymes and see how useful or not these are.

  • I oppose creating such subcategories for Czech rhyme pages.
  • I oppose that an editor not working on rhyme pages for a particular language creates rhyme subcategories for that language without having express support for doing so from editors working on rhyme pages for that language.

Subcategories for rhymes are a fairly useless form of organizing rhyme pages, IMHO. My idea of a useful organization of rhyme pages can be seen at Rhymes:Czech, which uses tables AKA matrices rather than a hierarchical tree, which is what categories present. Worse yet, subcategories do not present the hierarchical tree at a glance; rather, you have to click through them one at a time to see their content; even by clicking them one at a time, you won't see the larger picture.

--Dan Polansky (talk) 09:10, 14 September 2014 (UTC)

Please make RFE & RFP mandatory[edit]

Requests for etymology and requests for pronunciation should be common practice. Contributors should have the luxury of being able to facilely go through catalogues of terms that are without etymology or pronunciation. It facilitates navigation and accelerates labour. Concerning etymology, exceptions can be made for some noncanonical entries and some alternative forms, but that is about it. --Æ&Œ (talk) 07:16, 15 September 2014 (UTC)

Every lemma entry should have under those headers either good content or a link to an entry that has such content.
But I don't think universal use of {{rfp}} and {{rfe}} will lead to more good content under the headings. Perhaps it would be nice to make sure that all lemma L2 sections have Etymology and Pronunciation headers to reduce tedious typing for those who would add the content. IMO it is more important to find out which entries actually have motivated a specific individual request. It might be nice even to have a mechanism for votes supporting such requests on specific entries.
I would strongly favor creating lists (even just counts) of lemma L2 sections that lack pronunciation headers, etymology headers, translation headers and, more importantly, those that have the headers but lack actual content under those headers. DCDuring TALK 15:12, 15 September 2014 (UTC)
  • Block this troll already. This is not a sane proposal, and the troll knows it very well. --Dan Polansky (talk) 17:17, 15 September 2014 (UTC)
The French Wiktionary always uses {{rfp}} (well the nearest equivalent). Admittedly the total number of entries that use it is in the hundreds of thousands, probably over a million. Renard Migrant (talk) 22:13, 15 September 2014 (UTC)
What exactly is the point of a request category with millions of members? DTLHS (talk) 22:17, 15 September 2014 (UTC)
Maybe we should start making a distinction between entries for which something is requested, and entries which are merely lacking something? —CodeCat 22:27, 15 September 2014 (UTC)
That's what I was trying to get at. I was thinking that lists of entries lacking pronunciation headers or lacking etymology headers would be good applications of dump-processing. I would think they would be most helpful for English one-word lemmas. DCDuring TALK 23:50, 15 September 2014 (UTC)
We have 369K members of the English lemma category, 89K mainspace entries that contain both "English" and "pronunciation" (so, probably an overestimate of entries with English Pronunciation headers) and 1K English rfps. It seems that the English lemma category includes many abbreviations, plurals, multi-word terms, and other items for which pronunciation and etymology are not necessarily worth any significant effort. DCDuring TALK 00:05, 16 September 2014 (UTC)
I found nothing strange in the request. The requests would not be manageable but it's not a reason for blocking or ridiculing. As for the request itself, I oppose it. We have a huge number of such requests already. We're lucky to have an entry for a term - with an English translation.
Etymology: I actually find Korean entries a good example - the etymologies are split roughly by Sino-Korean (40-60%), native (about 35%), loanwords from European languages (5%). Having something like "native + language name" is already informative. For Slavic, Germanic, Romance, etc. the minimum info could be "Slavic", etc.
Pronunciation: For languages such as Czech, etc. pronunciation can be automated but someone has to create a module. Languages, such as English could use a phonetic respelling to get automatic IPA, look at Persian or Chinese which use transcription to get IPA. --Anatoli T. (обсудить/вклад) 00:19, 16 September 2014 (UTC)

Category:Terms needing transliteration by language[edit]

I've not spotted this one before but, some of these use [[:Category:<langname> needing transliteration]] and some use [[:Category:<langname> lacking transliteration]]. Even worse, some use both and split the entries over two categories. Purely because of the name of the parent category, could we align these into [[:Category:<langname> needing transliteration]]? For our purposes lacking and needing are synonymous. Renard Migrant (talk) 22:12, 15 September 2014 (UTC)

large page navigation[edit]

When I’m loading a large page, I have to wait approximately one minute for the content to load just so that I can see the languages that I’m interested in. The table of contents is irritating to navigate, particularly on pages with a huge number of bytes (e.g.: a). The table doesn’t even have nested tabs like on Wiktionnaire.

My idea would be to have some sort of option to ‘filter’ languages before the page loads, but I suspect that this would be very difficult to programme. I can’t think of a superior alternative, though. Do you lot have any better ideas, by any chance?

Am I the only one who finds it annoying to navigate high‐content pages? I’m not sure what we can do about it, though. --Æ&Œ (talk) 13:04, 16 September 2014 (UTC)

Buy a newer computer. I never have problems with large pages. --Vahag (talk) 13:25, 16 September 2014 (UTC)
I don’t have that kind of trouble at all. For me, every page loads as fast as every other page ... in about a second. First, I think you should go to PREFERENCES > Gadgets > User interface gadgets, and tick Enable Tabbed Languages. Each language will have its own page (and no more tables of contents). Second, I don’t know if the browser makes any difference, but I use the very latest Firefox browser, Firefox 32.0.1. Third, RAM memory might be an issue, and you should see if you can get another RAM memory card that will increase your computer’s memory. My computer is just a cheap old laptop that I bought second-hand several years ago. I just added some RAM and upgraded to Windows 7. I think if you do these things, your pages will load quickly. —Stephen (Talk) 13:57, 16 September 2014 (UTC)

Subpage editing weirdness[edit]

I went to cull some blue links from User:Brian0918/Hotlist/A2, and was unable to save the edit, instead receiving a message that said:

This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please inform an administrator of what you were trying to do. A brief description of the abuse rule which your action matched is: Users touching other users' user pages and subpages

I was, however, able to move the page to my userspace, delete some of those blue links, and then move it back. I also tried editing User:Robert Ullmann/Oldest redlinks, and got the same message. Since I am an administrator, it seems rather pointless to "inform an administrator" of what I was trying to do. I have edited subpages like these up until earlier today without this issue coming up. What changed? bd2412 T 19:02, 16 September 2014 (UTC)

See Special:AbuseFilter/24. Looking at today's change log, it says: "2014-09-16: Check user_rights for "autoconfirmed" instead of user_groups for "*confirmed". That way global groups providing "autoconfirmed" are also supported whilst still supporting the local "confirmed" user group, too. ––Krinkle". Equinox 19:09, 16 September 2014 (UTC)
Fixed. Next time, post this in the WT:GP. --WikiTiki89 19:11, 16 September 2014 (UTC)
... but as long as we're in the BP: there has been discussion of changing the filter to allow users to edit others' subpages; I now support such a change. Would anyone else like to express a view on (or implement) that change? - -sche (discuss) 19:14, 16 September 2014 (UTC)
I support such a change. It's often legitimate to edit subpages of other users, like to update bot feed lists or to remove entries from lists that another user has generated from a dump. Editing the main user page of another user should be restricted to people who know what they are doing (unlike User:WritersCramp who vandalised my user page in diff). —CodeCat 19:27, 16 September 2014 (UTC)
I've been doing that for years. Removing entries from lists that another user has generated from a dump, I mean, not vandalizing CodeCat's user page. bd2412 T 20:09, 16 September 2014 (UTC)
As admin I've been editing selected dump-populated subpages, too. But I have simple questions about this:
  1. Would this be a default setting that could be overridden?
  2. What level of user are we talking about? Registered? Whitelisted?
Prudence would require that the pages be added to one's watchlist, no matter what the default and no matter what level of user. DCDuring TALK 21:47, 16 September 2014 (UTC)

bookworm: movies[edit]

Here’s a kind of ngram for movies & TV: movies.benschmidt.orgMichael Z. 2014-09-16 21:16 z

Korean lemmas and categories[edit]

Discussion: User_talk:Jusjih#Categories

(Notifying TAKASUGI Shinji, Wyang, Jusjih):

We have a bit of a disagreement with User:Jusjih regarding Korean hanja. In my opinion, hanja or Chinese character forms, as not a primary writing system for modern Korean, should not have any topical categories. Cf. Japanese kyūjitai (pre-reform spellings), kana (usually hiragana and sometimes katakana) terms (when not the most common spelling), let alone romanisation entries - rōmaji and pinyin.

E.g. 중국인 (junggugin) (the current standard spelling) belongs to Category:ko:Nationalities but IMO 中國人 () (hanja entry) should not. I would say the same about Vietnamese Hán tự entries. Both Korean hanja and Vietnamese Hán tự definitions are, by convention, very short (one-liner) and only link to the current writing system, i.e. hangeul for Korean and Latin spelling for Vietnamese.

What do you think? Comments regarding kyūjitai, kana, Hán tự would also be appreciated. Format for rōmaji and pinyin entries are set in stone by appropriate votes. --Anatoli T. (обсудить/вклад) 06:02, 17 September 2014 (UTC)

As South Korean debates of pure hangul vs. mixed script with hanja are still ongoing, your POV toward Korean hanja could extend the Korean "text war" (w:zh:朝鮮漢字#六十年文字戰争, w:ja:朝鮮における漢字#「文字戦争」). Wiktionary is not a battlefield. Even Korean Google News does get a few hanja without hangul in parenthesis, like "사우디전 나서는 이광종號, 주목해야 될 점은?". When using a Korean hanja dictionary getting homophone compounds is so easy, like 수도 (sudo) matching so many different hanja compounds with very different topics.--Jusjih (talk) 07:16, 17 September 2014 (UTC)
To clarify, I don't hold or promote any negative opinion about hanja, I create hanja entries, too. I'm just saying that hangeul is the primary writing system and the forms learners are more likely to use. Most hanja entries lack categories, pronunciation and other sections too. I don't think it's such a great deal but maintaining categorisation for both forms is also quite difficult. --Anatoli T. (обсудить/вклад) 07:47, 17 September 2014 (UTC)
The practice of centralising all information on the Hangul/Quoc Ngu pages is about minimising the amount of maintenance work needed, as words in various East Asian languages can easily be written in a variety of scripts. I also support merging the Simplified and Traditional Chinese entries, by making Traditional the lemma form where all trad-simp conversions are by default enabled, and making Simplified a pretty soft-redirect containing no information but the link and the Hanzi box. Wyang (talk) 00:30, 18 September 2014 (UTC)
If simplifying maintenance is the reason not to categorize topics in too many pages, here are some scenarios that your proposed merger from simplified Chinese to traditional Chinese will work:
  1. If a Chinese compound has simplified and traditional characters while unused in Japanese kanji, Korean hanja, and Vietnamese Hán tự, then your proposed merger to redirect will work very well.
  2. If a Chinese compound has simplified and traditional characters while also used in Japanese kanji unchanged and Korean hanja unsimplified, then your proposed merger to redirect will work well, like the noun 结婚 redirected to 結婚 (marriage).
  3. Finally, thanks for promoting traditional Chinese. I have heard that red China has officials calling for reverting to traditional Chinese with no further avail yet, but I have not heard of Japan planning to return to kyujitai. Once Chinese soft redirects work well, would you like Japanese kyujitai (old characters) soft-redirected to shinjitai (new characters) as well? Some but not all Japanese shinjitai are the same as simplified Chinese.--Jusjih (talk) 04:36, 18 September 2014 (UTC)
@Jusjih:. I think User:Wyang means a soft redirect, meaning there will be still entries for simplified forms, traditional having all the term info. It has its merits (centralising all information), even though simplified Chinese beats traditional 10:1 or so (Internet content and publications). --Anatoli T. (обсудить/вклад) 01:06, 19 September 2014 (UTC)
Chinese Wiktionary is even more sensitive on traditional vs. simplified characters. English Wikisource already uses soft redirects for different purposes. Now, what is your concern of maintaining categories on different scripts, like Korean hangul vs. hanja? Because categories may change in the future in some ways?--Jusjih (talk) 06:11, 19 September 2014 (UTC)
My concern is synchronisation. It's difficult to keep in sync infos in both traditional and simplified entries. E.g. currently there are many traditional entries without the audio link but simplified entries have them. If we add categorisations to hanja as well, then ALL hanja entries should have them but I don't see the need. If, e.g. countries, fruits, animals contains hangeul only, it's sufficient, one-line hanja entries have the link to hangeul entries. As for the Chinese Wiktionary, it's not very consistent with keeping only simplified entries but they have only one entry per term, not two, like paper dictionaries - they provide both forms but in ONE entry. --Anatoli T. (обсудить/вклад) 06:31, 19 September 2014 (UTC)
Then soft redirects or the like should be considered for these?
  1. simplified to traditional Chinese (Hong Kong and Macau sometimes use different traditional characters from Taiwan.)
  2. Japanese kyujitai to shinjitai
  3. Korean hanja to hangul
  4. Vietnamese Hán tự to Latinized ahplabets
As treating different scripts in CJKV as equally as possible would be ideal, I wonder how feasible it is to automatically synchronize topic categories with our proposed soft redirects. This will require some technical works to detect topic categories in target pages.--Jusjih (talk) 05:50, 20 September 2014 (UTC)
  1. Variants (with qualifiers, context labels) would be sufficient. Simplified characters are actually better standardised than traditional and have less dated, rare, obscure or regional characters. Well, that was also a reason for the simplification.
  2. Already the case, sort of. Kyūjitai has less info than shinjitai. User:Eirikr thinks they also need dated labels or similar, since kyūjitai is no longer used in Japan.
  3. Don't you like our current hanja structure?
  4. Same for Vietnamese Hán tự. Hanja and Hán tự entries are almost like soft redirects already. At least they are supposed to be one-liners, with pronunciation, etymology, synonyms, usage examples, etc. in hangeul entries.
From User:Wyang's post I understand he wants simplified entries look something like this:
==Chinese==
{{zh-hanzi-box|[[天]][[气]]|[[天氣]]}}

----
The problem is here with the definition lines and missing PoS. Even if we make it like (with a new template {{jiantizi}} or similar):
==Chinese==
{{zh-hanzi-box|[[天]][[气]]|[[天氣]]}}

# {{jiantizi|天氣}}

----
The community may not accept such formats. --Anatoli T. (обсудить/вклад) 07:02, 20 September 2014 (UTC)
I like Anatoli's second proposed format. Wyang (talk) 00:38, 22 September 2014 (UTC)
@Wyang: I was just materialising your view in an example. As I said, the problem is with the missing PoS header. It won't work without a vote because it violates the current WT:ELE. The format may need tweaking to match something like {{alternative form of}} examples or other templates. If PoS headers are included, it has more chances to pass. Otherwise, let's see what others will say. If it passes in this format, then WT:ELE need to be updated to reflect this exception. Other Chinese editors need to be polled as well. --Anatoli T. (обсудить/вклад) 00:51, 22 September 2014 (UTC)
(Notifying Kc kennylau, Atitarev, Tooironic, Jamesjiao, Bumm13, Meihouwang): Pinging other editors. Please express your views here too. Wyang (talk) 00:55, 22 September 2014 (UTC)
I am favourable to a centralisation of the information in the traditional page.
  • First it will reduce the amount of work for word with simplified and traditional form. no need to copy and convert the work done on one form to the other.
  • Secondly, the Wiktionary data will be easier to parse as there would be no duplicate informations. I parse the Wiktionary data to use in dictionary I am programming. With character with two form, it's hard to programmatically merge the two entries so I just take one arbitrarily.
A little more work will need to be done as every word in the entry will need to be written in simplified and traditional form (with {{zh-l}}, {{zh-ts}}, {{zh-tra}} or {{zh-sim}}) but in the end the will be less work to do than when synchronizing simplified and traditional entries.Meihouwang (talk) 11:17, 24 September 2014 (UTC)
I am more favourable in centralising trad./simpl. entries but with preserving PoS headers and categorisations, that way simplified entries won't be "discriminated against". The definition lines would contain soft redirects to traditional entries. Simplified entries would still require maintenance but would be less "pronunciation, etymology, synonyms, antonyms, usage examples, usage notes, etc., etc. I won't insist on this if Wyang's proposal passes and others agree with his plan. BTW, we should separate this topic from the original Korean hanja.
(While realising the need to centralise and simplify work, jiantizi is a standard in China, Singapore, Malaysia and most universities teaching Chinese, preferred by foreign learners and, as I mentioned before, has much more Internet penetration and amount of published texts. Japanese shinjitai also often coincides with Chinese jiantizi (about 30%?). Just a thought, which may be raised by opponents.)--01:05, 29 September 2014 (UTC)

Visual Editor[edit]

For the record, I oppose the introduction of Visual Editor (W:Wikipedia:VisualEditor) in any form or manner into English Wiktionary; I also oppose an opt-in introduction. Some reasons for this opposition were stated by Kephin in Wiktionary:Grease_pit/2014/September#Visual_Editor. --Dan Polansky (talk) 05:40, 18 September 2014 (UTC)

  • Hi Dan. Having this tool would allow me to more easily do some things that would be helpful to the project. Please reconsider. Perhaps we could condition use on the editor having some level of participation which makes misuse unlikely (admin-only or the like). Cheers! bd2412 T 14:09, 18 September 2014 (UTC)
  • I see no reason why this should not be allowed as an opt-in if it does not have any effect on anyone other than the contributor using it. I doubt that it would be a serious server-resource hog, that it would generate a lot of help requests at WT:GP or WT:ID, or that it would generate a higher ratio of bad content to good content, at least as an opt-in. DCDuring TALK 14:19, 18 September 2014 (UTC)
  • I also support the introduction of Visual Editor, at least as opt-in. —Mr. Granger (talkcontribs) 22:30, 18 September 2014 (UTC)
  • I would also say there is no reason for it not to be available for anyone who wants to try it. It should not, of course, be enabled by default, or even promoted in any way, but it should at least be possible for individual users to use it somehow. VE is at the stage where it is not causing page corruption or other undesirable effects that disrupt or pollute wikitext or diffs. This, that and the other (talk) 10:14, 23 September 2014 (UTC)

requests for synonyms[edit]

I believe that we should have a template that requests synonyms. This may be somewhat problematic since not all words are going to have synonyms, but perhaps we could compensate by simply inserting ‘This term does not have any synonyms.’ Is anybody totally opposed to this template idea? --Æ&Œ (talk) 13:23, 18 September 2014 (UTC)

From the user's point of view I think it's annoying and unhelpful to have editor-only content in an entry, saying "hey, something is missing!". I would prefer this kind of thing to be some kind of invisible markup that generates a category (for those wanting to help with that category) but doesn't add text to the entry. Also, is this a feature that people would use; do we get many requests for synonyms already, on talk pages? Equinox 13:28, 18 September 2014 (UTC)
You both make good points.
We do get occasional requests for synonyms at Info Desk and Tea Room. As we discourage the use of entry talk pages by not responding to them quickly, it is no surprise that they are not much use for this. Even if the only use of this would be to structure user requests to make them easier to fulfill it would be a help.
I agree that some of our request boxes aimed at requesting content, such as those created by {{rfi}}, {{rfp}}, {{rfe}}, are too prominent. The alternative of eliminating them and relying exclusively on a category misses a chance to teach ordinary users that such requests exist, which may help them make such requests, which may get them involved as content contributors. I also find that occasionally I will be motivated by a visible request in an entry to add the requested content. OTOH I have never been motivated to add content by a category.
Accordingly, we could make a {{rfsyn}} that displayed on the scale of {{rfv-sense}}. I would further favor it not displaying properly unless a sense were provided. Another approach would be to modify {{sense}} to allow a second parameter which could be "?", which could generate the (modest) display and categorize. DCDuring TALK 14:09, 18 September 2014 (UTC)
I have added and used {{rfelite}}, both because it is less intrusive and therefore desirable and as a demonstration of what a {{rfsyn}} could look like. DCDuring TALK 16:23, 23 September 2014 (UTC)
I’m not sure what you mean by ‘editor‐only content.’ Are you saying that your requests for quotations aren’t ‘editor‐only content?’ I’m confused.
I don’t remember many requests from others concerning synonyms, but I certainly do request many synonyms from Mister Brown. Not sure why you think that it must be requested to be merited. Nobody requests my silly entries, but I create them any way. I mean, if I end up helping people besides myself, great. But if it only helps me, whatever. Generally speaking, I create what I want because I want it, not because somebody else does. --Æ&Œ (talk) 14:21, 18 September 2014 (UTC)
  • I oppose there being a template for requests for synonyms. --Dan Polansky (talk) 07:33, 20 September 2014 (UTC)

Can we disable the #babel parser function?[edit]

This parser function is quite problematic because it does not use Wiktionary's own database of languages, but uses Wikimedia's. These two lists differ on some crucial points. In particular, the Wikimedia list is not always ISO compatible, using "als" for Alemannic German while this code represents Tosk Albanian in ISO and "gsw" is used in both ISO and Wiktionary for Alemannic. The function also includes codes like "zh-min-nan" for our "nan", "nds" for our "nds-de", and also codes not recognised at all by Wiktionary like the Serbo-Croatian standard varieties, and even "Simple English". And perhaps most crucially, it does not support any of the custom Wiktionary codes (basically all the ones in Module:languages/datax). It could be argued that users should be free to declare their language in the form they prefer, but at the same time the main point of Babel boxes is to allow other users to find people who know a particular language well enough to edit entries for it. In that light, since we have no Croatian or Simple English entries, we don't need categories for speakers of those languages either. —CodeCat 22:37, 19 September 2014 (UTC)

That's the whole point of it: so people can use the same babel template across all Wikimedia sites. I'd say we should discourage its use, especially by frequent contributors, but definitely not prohibit it, and definitely not disable it. Also, I don't see people listing languages Wiktionary doesn't recognize as a problem in any way. --WikiTiki89 22:47, 19 September 2014 (UTC)
What do you suggest we do with categories like Category:User simple, Category:User zh-min-nan or Category:User als then? —CodeCat 22:57, 19 September 2014 (UTC)
That's one minor flaw. Can we make it not categorize? --WikiTiki89 23:05, 19 September 2014 (UTC)
Not that I know of. And if we could, what would be the point in having it at all? —CodeCat 23:19, 19 September 2014 (UTC)
The babel boxes on the userpage... --WikiTiki89 23:34, 19 September 2014 (UTC)
I don't think the current format of the babel boxes is very useful anyway. Because the text is not in English, someone who doesn't understand the language will probably not know what language it is, unless they also know the language code. It would be clearer if the description was in English. —CodeCat 01:15, 20 September 2014 (UTC)
You can use non-predefined templates within the Babel boxes. The templates just have to start with the name "User " (or maybe "user ", not sure what happens on case-sensitive wikis like this one). Such templates could be created for languages not known to Wikimedia. This, that and the other (talk) 10:16, 23 September 2014 (UTC)

Category structure documentation, review and correction[edit]

I have a few questions about categories.

  1. Where is the topical category structure and its current implementation documented? I can find lots of bits of obsolete documentation, predating modules, but it would take a research project to figure out how things are now working.
  2. Where is the rationale for the particular topical hierarchical structure explained? It seems to be the product of at most two minds, which minds I cannot read.
  3. What is the process for reviewing the adequacy of the structure and then implementing changes? None of our existing review pages seem appropriate based on their names.

Is it supposed to be a secret? DCDuring TALK 16:42, 20 September 2014 (UTC)

Automation of German verb conjugation[edit]

In May, I have raised the discussion of automation of German verb conjugation both in BP and in GP, in which nobody has put any idea. If nobody opposes, I shall start the automation soon. --kc_kennylau (talk) 08:12, 21 September 2014 (UTC)

I don't understand what you are proposing, since I thought the German conjugation templates are already automatic to a considerable extent. It won't harm to ping a couple of people: User:-sche, User:Matthias Buchmeier, User:Liliana-60, User:Longtrend, User:91.61.118.8. --Dan Polansky (talk) 10:56, 21 September 2014 (UTC)
@Dan Polansky: Then treat it as a reform. --kc_kennylau (talk) 13:02, 22 September 2014 (UTC)
It looks to me like the code in Module:de-conj is half-implemented. It handles some strong verbs but there are tons more. Is there not a better way of handling strong verbs than essentially listing each one? I've seen analyses of German strong verbs in terms of the seven classical Germanic strong-verb classes, and if this is fairly regular in the modern language, it might make more sense to do it this way. In classical Germanic languages, which class you're in is predictable to a large degree from the stem vowel. This may not completely apply in the modern language, meaning you sometimes will have to specify the class explicitly.
Also, the documentation in Template:de-conj-auto is totally confusing and needs to be rewritten and expanded, if you expect other people to figure out how to use it. Although the docs say to divide the verb into "prefix", "stem" and "ending", in reality it's not at all obvious how to separate stem and ending in the expected way. Why for example does finden separate into stem f- and ending -inden, or even more strangely, how would someone possibly figure out that in erlöschen the "ending" is -löschen and the stem is empty, while in plain löschen the ending is -en and the stem is lösch-? Benwing (talk) 02:45, 23 September 2014 (UTC)
Maybe Module:nl-verb can be used as a base to work from? —CodeCat 12:12, 24 September 2014 (UTC)
  • If it still requires parameters, then it's not automation. At best, it's parameters reduction. Users don't care about the behind-the-scenes voodoo generating the inflection tables that they see. Perhaps a less time-consuming path would be compiling a table that leverages existing template infrastructure using some of the online German inflection databases, creating a huge correlation table (split into several parts to take care of the memory consumption limits) so that the usage of {{de-conj-auto}} would require no parameters at all? It would be a simple pattern matching exercise. Just a suggestion. --Ivan Štambuk (talk) 00:04, 29 September 2014 (UTC)

Documenting in WT:CFI our treatment of certain typographic and code-point variants[edit]

Pursuant to Wiktionary:Votes/2011-06/Redirecting combining characters, whenever Unicode has included both a combining and a non-combining variant of a character, Wiktionary excludes the combining variant except as a redirect to the non-comibin variant. I recently moved the documentation of this practice from the "conveying meaning" section of WT:CFI to the "spellings" section, and replaced the hand-wavy lament that the vote didn't explicitly specify text in WT:CFI to be changed with the text of the approved proposal. It got me wondering: should we also document our exclusion of other typographic and code-point variants — our exclusion of ligature variants like fisherwoman, and long-s spellings, and perhaps also our exclusion of The? And perhaps also our inclusion of variants like vp and dies Iouis? (We could add a section with a header like "typography and encoding" next to the "spelling" section, or wherever else is deemed most appropriate.) - -sche (discuss) 18:34, 24 September 2014 (UTC)

Names of letters of the English alphabet and their plurals[edit]

When searching to confirm the spelled out names of letters, of the English alphabet, I found that there were considerable inconsistencies in the entries. I did not know where to expect to find them as I moved through the alphabet.

Some letter entries offer the singular and plural for the noun (with a plural). Others have no noun entry or only the singular spelling is offered. That singular spelling might also be in the <letter> entry or the <number> entry or both.

In any case there are sometimes alternative names.

For these names to be presented consistently it appears that the entries themselves need to be more consistent.GHibbs (talk) 05:20, 25 September 2014 (UTC)

Easier to find them by using the category: Category:en:Latin letter names. —Stephen (Talk) 10:14, 25 September 2014 (UTC)
Who the devil came up with these spellings, by the way? Why "aitch" and not "eitch"? Why "wye" and not "wai", for instance? Tharthan (talk) 18:43, 29 September 2014 (UTC)

you need to expand quotations!![edit]

I was surprised by http://en.wiktionary.org/wiki/gunsel which stated "By misunderstanding of the 1929 Maltese Falcon quotation above". There is no quotation above. So I thought, well, clearly it got removed. I went through the history. I found the ACTUAL EDIT that added the quotatino (http://en.wiktionary.org/w/index.php?title=gunsel&diff=7728039&oldid=7715648 ) and then I clicked on that version (after literally seeing it after a + in the diff, so I knew it was there). I got to this page http://en.wiktionary.org/w/index.php?title=gunsel&oldid=7728039 and STILL didn't see it. I clicked back, found what I was looking for, clicked forward, to read it on the page again, STILL couldn't see it. It took me 5 minutes to find that there is a hidden part of Wiktionary, that I NEVER knew about, hiding a lot of text, under an impossible to see superscript that has a similar styling to the IPA key! I never would have thought that has text. This makes me angry, as it means for HUNDREDS of wiktionary entries I've seen over the past year (or whatever length of time), I've been missing really valuable quotations people took the time to upload! For this reason I very strongly suggest changing this template or software so that if you do need to hide it, at least the first few character (10, say) are shown. This would let you not have to include full quotations open, if you don't want them, yet make sure nobody misses this really valuable content that is added specifically for the entry.

As it is not at all standard template language (like the IPA key) I would actually advise including the full quotations without hiding them, regardless of length. Space is simply not at a premium in a wiktionary entry! The alternative is to change the software so that the full quotation is hidden, but the fact that it is custom content specific to that page is clear, from the first few words being visible. Thank you kindly!!! —This unsigned comment was added by ‎91.120.14.30 (talkcontribs) at 08:13, 25 September 2014 (UTC).

This is the first time that I have heard of this kind of problem.
We would like to understand your problem a bit better so that we can address it.
What kind of device were you using? What browser (including version) were you using? Does it have Javascript enabled?
Did you see the text "quotations" where the "superscript that has a similar styling to the IPA key" was?
Were you logged in as a registered user?
I will attempt to get the attention of someone who can address the problem technically. DCDuring TALK 18:11, 25 September 2014 (UTC)
Thank you. Yes of course, I mean that I didn't notice the word "Quotations" has valuable content under it! There is no other important content that is 'hidden' in this way. So this is a UI suggestion. You are, in my humble suggestion, taking away from some of the valuable work people contribute by folding it up in this way. To answer your questions, I was just viewing it from a standard browser, it looks like this to me. http://imgur.com/qVfdaKI I realize this is 'as intended', and I realize that you can click Quotations. I think this should be changed so that the text is not hidden. There is no other hidden content on the page, nor any reason to hide it, but there are irrelevant links (like the IPA key, and the Edit links) that I am used to ignoring. Thanks so much! 91.120.14.30 09:32, 26 September 2014 (UTC)
@91.120.14.30: What do you say to this sort of display instead? Is that more noticeable? The number would change and we could use a more distinctive bullet (like ‣, ❧, or ⦿) if that'd be better. — I.S.M.E.T.A. 14:59, 28 September 2014 (UTC)
That's pretty good, but IMO it would be bettert without the break. It is still true that we want to get as many definitions as possible on the screen at once within the limits of our entry structure. Giving up a display line for a problem that hasn't surfaced before seems unwarranted to me. DCDuring TALK 15:40, 28 September 2014 (UTC)
A small part of the quotation should be shown in a style that suggests it's a quotation, but the text should fade out along the bottom or towards the end so the user instantly perceives that there is a quote without having to explicitly tell the user "Hey, there's a quotation here just click on this arrow here! We hid it so we wouldn't have to fill this entry with as much stuff." I cannot think where to find an example of the "fading" style but it seems common enough online. I find the current collapsed quotations (and also hidden translation tables, and other collapsed content) quite awful from a user experience perspective, as they are very easy to miss (as demonstrated here), also the user has little idea what will happen when they click, and it requires a bunch of clicking on little arrows to open the whole page. This would at least be partly fixed with a style where part of the quotation (or table or whatever) acts as a suggestion that there is more, rather than a mystery triangle. Pengo (talk) 10:32, 29 September 2014 (UTC)

Royal Society of Chemistry - Wikimedian in Residence[edit]

Hi folks,

I've just started work as w:Wikimedian in Residence at the w:Royal Society of Chemistry. Over the coming year, I'll be working with RSC staff and members, to help them to improve the coverage of chemistry-related topics in Wikipedia and sister projects.

You can keep track of progress at w:en:Wikipedia:GLAM/Royal Society of Chemistry, or use my talk page here if you have any questions or suggestions, or requests for help with chemistry-related terms. Pigsonthewing (talk) 12:59, 25 September 2014 (UTC)

Disable automatic creation of redirects?[edit]

Because Wiktionary tends to avoid redirects, we generally don't want to leave redirects when we move pages. But only sysops can move pages without leaving a redirect behind, which is cumbersome, especially as ordinary users are not aware of this and will not mark the redirect for deletion afterwards. So I think that redirects should not be created when moving pages, or at least not by default. Ordinary users should also be given the option to disable the redirect. —CodeCat 16:57, 25 September 2014 (UTC)

Is our patrolling and filtering diligent enough to catch something being moved to a hard-to-find namespace or pagename? DCDuring TALK 18:29, 25 September 2014 (UTC)
Probably not. But would we find them even now? —CodeCat 19:33, 25 September 2014 (UTC)
At least a user would find it with a properly typed search. Also a link to the original name would. We could also periodically search the dump for redirects to implausible namespaces or implausible characters. Without the redirect the search might be harder and the remedy a bit more time-consuming. DCDuring TALK 19:40, 25 September 2014 (UTC)

Category:English coordinates[edit]

I'm a bit confused by this category. Many of the terms listed don't actually match the description. For example "et" is not a coordinating conjunction in English, and "etcetera" is not understood as a combination of a coordinator and a head, but is treated by English speakers as a single indivisible set phrase. But even then, is the term "coordinates" even the most common term for this? We already have Category:English non-constituents, should these really go there? —CodeCat 20:22, 25 September 2014 (UTC)

It might be useful to make it a subcategory of English non-constituents. The description could be improved to allow for abbreviations that are synonyms of a combination of coordinator and head. I'm open to other suggested names.
I've often wondered why all MWEs are categorized into phrases when many are not, why we categorize things as interjections when many are not, why we use context tags to make topical categories. I suppose many of our categories are compromises among any contributors' senses of the logic of things, the limited availability of generally accepted terms, and the willingness to police them. DCDuring TALK 21:47, 25 September 2014 (UTC)
I have to say that I don't fully understand the meanings of many linguistic terms like these, and it doesn't help that people might have different ideas of their meanings, not just here but also in scholarly circles. I've been trying to make things fit a bit more, but it's not easy. I would greatly appreciate it if other editors could review the current structure, in particular the categories that now use {{poscatboiler}}. I'm trying to add more and more categories to that module/template, but it's hard when I don't even really know what the categories' names mean or what they're meant to contain. That's why I came here; I have no idea what a "coordinate" is, nor if it's a linguistically standard term, nor whether all the entries in the category belong there. —CodeCat 21:56, 25 September 2014 (UTC)
I don't know that there is any official term to characterize these expressions, but no PoS does them justice. To call them "Adverb" in the wastebasket use of that term hardly does them justice. Calling them Phrase is contrary to the grammatical definition of the word and our general effort to use grammatical PoS terms, correctly applied, for headers. Other dictionaries variously assign nice and no PoS, Adjective with an adverbial definition, Adverb, and Idiom.
I have edited the text, removed the one item, [[honoris causa]], that clearly didn't belong there, added one, and made the category a subcategory of English non-constituents. The category could be made more homogeneous by eliminating items that could not terminate lists of multiple conjuncts, but why? DCDuring TALK 22:52, 25 September 2014 (UTC)
In a sense, these terms feel a bit like prepositions. They are "incomplete" and need something extra to make a whole "thing". —CodeCat 22:55, 25 September 2014 (UTC)
And also like other non-constituents, but also like determiners, conjunctions; adjuncts, like adjectives and manner adverbs; copulas and transitive verbs. Everything short of a complete canonical sentence can be considered grammatically incomplete in some way. DCDuring TALK 00:18, 26 September 2014 (UTC)

Description for Category:Predicatives by language?[edit]

I'm struggling a bit with coming up with a description for the terms in this category. Right now, the descriptions of the individual categories just say "?". Can anyone come up with something better? —CodeCat 20:43, 25 September 2014 (UTC)

Maybe it would be easier if we used these words' actual parts of speech rather than making stuff up. —Aɴɢʀ (talk) 21:11, 25 September 2014 (UTC)
If parts of speech actually provided a complete category structure for our entries and exhausted the useful, generally accepted knowledge about a language, they would indeed suffice. Would that they did.
I always thought that the category structure in our software and most other user-oriented software did not compel hierarchical structure for good reason: poor correspondence with most folks' needs and perception of reality.
Supplementing PoS categories is essential, not only to overcome the poor match between PoSes and the actual nature of many of our entries, eg, Proverbs which aren't even proverbs, Phrases which aren't phrases, Interjections which aren't interjections, but also to reflect kernels of knowledge that can make break our categories into more manageable units, especially those that reflect some actual specific knowledge about some linguistic class of entries. DCDuring TALK 22:08, 25 September 2014 (UTC)
That's why we now have Category:English terms by semantic function. This category deals with ways of categorising terms that is not strictly a matter of part of speech, but has some semantic element in it too. —CodeCat 22:19, 25 September 2014 (UTC)

Vote on CFI Misspelling Cleanup[edit]

Wiktionary:Votes/pl-2014-08/CFI Misspelling Cleanup is nearing to its end. Could you please post your vote, even if it is "abstain"? It is a trivial vote, from my standpoint, but it would do with a couple of abstains so that it can be cleanly closed. --Dan Polansky (talk) 17:17, 26 September 2014 (UTC)

@Dan Polansky: Done. — I.S.M.E.T.A. 14:35, 28 September 2014 (UTC)

Some confusion with suffixes and the absence of many prefixes and (is it all) infixes .[edit]

1. Some confusion with suffixes.

In Wiktionary, sometimes suffixes have special characters like the Latin <-ālis>, but sometimes they do not, as in the Latin <abdominalis> (Wiktionary entry). The suffix entry does not accommodate the <-alis> version.

Sometimes the suffixes in Wiktionary have two components <-at-> and <-ive>, or even three components <-at-> <-i-> and <-on>. The <-ate> or other components may or may not be acknowledged in the entry. How <-ate> may become <-at-> to combine to form other suffixes is not mentioned, though the archaic <-at> is acknowledged.

The entry <-ation> takes you the Latin <-ātiō> but there is no mention of the English <-atio> as in <ratio>.

2. Absence of many prefixes and (it may be all) infixes.

Clearly to include them all would be a gigantic effort. Perhaps some temporary rudimentary pages could be made. Prefixes are often used as infixes and identifying just the prefixes would be a valuable move forward.

GHibbs (talk) 08:06, 27 September 2014 (UTC)

I've added the appropriate long marks to abdominālis. It would be easier for us to find the things you're talking about if you linked to them using double square brackets [[like this]] rather than with greater than/less than signs. I don't think the -atio in ratio is really a suffix in English. —Aɴɢʀ (talk) 13:57, 29 September 2014 (UTC)
I see no good reason to declare -at- to be an English infix when we have -ate as an English suffix. This must be the product of the synchronic morphomania in our Etymology sections and overuse or misuse of {{confix}}. DCDuring TALK 15:04, 29 September 2014 (UTC)

Checking for invalid phonemes in Template:IPA[edit]

The template already checked for invalid characters before, but I've now added some functionality that lets you check for the validity of the phonemes themselves, according to which language is being used. This is done by listing all valid phonemes for the in Module:IPA/data. If an entry contains invalid phonemes, it's listed in both Category:IPA pronunciations with invalid phonemes and a language-specific subcategory. I've done it for Dutch, and it seems to work quite well, but I don't know if it would be useful for every language. At least it's there if anyone wants to use it, and I hope it helps. —CodeCat 22:31, 27 September 2014 (UTC)

Hm, that could be useful for languages with small or closed inventories, like Latin, Old Norse and Esperanto. I notice it's currently highlighting the /œ/ in Duits, but the Dutch Wiktionary says that word does indeed have a /œ/ in both the north Netherlands (where nl.Wikt says it's /dœʏ̯ts/, /dʌʏ̯ts/) and Flanders, Brabant and Limburg (/dœːts/). - -sche (discuss) 02:02, 28 September 2014 (UTC)
It's because it got written with the nonsyllabic diacritic above the y, while most other entries write it below. —CodeCat 11:52, 28 September 2014 (UTC)
Can't this be made to work when transcribing phonetically with [ ]? --Vahag (talk) 09:33, 29 September 2014 (UTC)
  • The purpose of IPA template is font support, it shouldn't decide whether the characters used in phonemic transcription are valid IPA characters or not. You can phonemically transcribe using whatever set of symbols you like. Even using e.g. Cyrillic characters as it is done for many Cyrillic-script based languages. Phonemic transcriptions are *not* pronunciations. (Which is why Wiktionary's usage of /ɹ/ instead of /r/ which every single other English dictionary does is so dumb.) Furthermore, phonemic inventory of a language depends on the author making such analysis and vary for just about any single language except artificial ones, or those with dictatorial institutions "governing" them. To proper step to reduce inconsistencies is to forbid manual transcriptions altogether and make pronunciation-generating modules in Lua, even if it requires phonetic respelling to properly generate regional variants. --Ivan Štambuk (talk) 23:48, 28 September 2014 (UTC)
    I mostly agree with Ivan, but I don't think it would hurt to have it do some behind the scenes cleanup category stuff, even if it is experimental with a lot of false positives. The average user would not see hidden categories and would not be affected by this in any way. --WikiTiki89 08:23, 29 September 2014 (UTC)
    I disagree that {{IPA}} can be used with whatever set of symbols you like. Pronunciation information can be added using other transcription systems, as we do with {{enPR}}, but non-IPA systems shouldn't use the {{IPA}} template. I asked for a way of categorizing invalid IPA characters because I was tired of finding things like g ' : instead of ɡ ˈ ː in IPA transcriptions and wanted an easy way to find all the instances. (I sometimes regret making that request, though, because the number of characters considered invalid is greater than I expected, and the number of pages in Category:IPA pronunciations with invalid IPA characters is far greater than anyone can work through.) But I am very skeptical of the attempt to find invalid language-specific phonemes, not least because we often give narrow phonetic transliteration in addition to broad phonemic transliteration. If the template knows that /kʰ/ and /æ̃/ are not phonemes of English, won't it incorrectly tag IPA(key): /kæn/, [kʰæ̃n] as containing invalid phonemes? Or is it smart enough to look only inside slashes and not inside square brackets? Then there's the problem of languages with dialects (/æː/ is a valid phoneme of Ulster Irish but not Munster Irish, /ɑː/ is the opposite) and the problem of people not wanting to stick to the symbols listed in our pronunciation appendices (I get a lot of grief from other editors for trying to make English pronunciations comply with Appendix:English pronunciation). —Aɴɢʀ (talk) 19:18, 29 September 2014 (UTC)
Why would we use /r/ for /ɹ/ when they're completely different phonemes? o_O Tharthan (talk) 11:39, 8 October 2014 (UTC)
@Tharthan: If you know what a phoneme is, then you should realize that in the context of English, that sentence is completely nonsensical. English only has one rhotic consonant phoneme, which is usually pronounced something close to [ɹ]. Whether you use /r/ or /ɹ/ to represent it makes no difference, but if you assume that you can only choose one of them, talking about them simultaneously as in "they're completely different phonemes" is completely nonsensical (or if you assume that you can choose both of them, then your sentence is plain wrong because they are the same phoneme). When choosing the representation of the phoneme, there are criteria to consider, such as how easy it is to input, how recognizable it is, how close to the actual phonetic realization it is, etc. How much weight we give each criterion is up to us. --WikiTiki89 11:58, 8 October 2014 (UTC)
@Wikitiki89:,some dialects of English (not in North America, but nevertheless they do exist) use /r/ or (more rarely) /ɾ/ where other dialects use /ɹ/. As such, we need to distinguish between /ɹ/ and /r/ so that those with an actual /r/ in their phonemic inventories don't get confused. Tharthan (talk) 16:46, 8 October 2014 (UTC)
I believe the dialects you refer to have [r] and [ɾ] as allophones of /ɹ/. --WikiTiki89 20:42, 8 October 2014 (UTC)
No one would be confused if we used the same symbol for the English r-sound that every single dictionary of the English language except Wiktionary uses, namely /r/. At worst we might have to distinguish between [ɹ] and [r] at the phonetic level (using the latter for, say, Scottish English), but never at the phonemic level. —Aɴɢʀ (talk) 17:21, 8 October 2014 (UTC)
The problem is that /r/ isn't the right IPA letter for the English "r" consonant in most dialects. As such, the idea of changing the correct and more-or-less unambiguous /ɹ/ to /r/ is ludicrous. What would be the purpose of making pronunciation transcription more ambiguous? Should we also write modern widespread British English dialectal glottal stops as if they were /t/s? Tharthan (talk) 00:37, 9 October 2014 (UTC)
The IPA is more flexible than you think it is. If Peter Ladefoged, Alfred C. Gimson, Kenyon and Knott, and John C. Wells are comfortable using /r/ to transcribe the English r-sound, we can be too. —Aɴɢʀ (talk) 05:51, 9 October 2014 (UTC)

Template:policy[edit]

Can someone please restore Template:policy to the revision from 7 May 2014? I think the color change is inappropriate. --Dan Polansky (talk) 18:45, 28 September 2014 (UTC)

Done, I agree with you. Additionally that template page itself shouldn't be modified without at least some discussion. --Neskaya sprecan? 23:01, 7 October 2014 (UTC)

Wiktionary URL shortcut[edit]

Hey guys, so it seems there's a redirect to EN Wikipedia at http://enwp.org/ (such that enwp.org/Foo redirects to en.wikipedia.org/wiki/Foo). I've used this many times and it's really useful, but I feel that it'd be great to extend it to EN Wiktionary.

The information at wikipedia:User talk:Tl-lomas/enwp.org indicates that I can use http://enwp.org/wikt:Foo to redirect to Wiktionary, but I quite feel that many people (myself included) surely must use Wiktionary enough to find a more direct URL useful. Furthermore, the user who created that script has not edited ENWP in four years and Wiktionary never - he isn't responding to any past talk page messages, and presumably won't be around to respond to feature requests. Thus I thought it'd be logical if someone set up an "enwt.org", but someone on IRC claimed that "enwikt.org" would be more logical based on current interwiki links.

Do you guys think it'd be worth it? 70.94.229.179 20:38, 28 September 2014 (UTC)

I don't think so. I think most browsers permit custom address-bar searches now. For example, in Opera, I have it set up so that "d blah" finds blah on Wiktionary, and "k blah" finds blah on Wikipedia. Equinox 20:40, 28 September 2014 (UTC)
I don't think that's the point. I think the point is simply to have shortcut URLs, similar to URL shortening. I think this might be useful, but it certainly is not necessary. --WikiTiki89 20:44, 28 September 2014 (UTC)
TBH the primary times when I would personally find this useful is on my phone when I need to look up a word. Currently it's annoying enough that I had to resort to a - shudder - paper dictionary when reading Tale of Two Cities. I will see if my mobile browsers support what you mention. 70.94.229.179 20:52, 28 September 2014 (UTC)

<meridium> is it < merīdīum> as in <post meridium> page or <merīdium> as the main entry.[edit]

The Latin spelling of <meridium>. Should it better be it < merīdīum> as in the English <post meridium> and <ant meridium> pages' cross references or <merīdium> as the main Latin entry? GHibbs (talk) 07:49, 29 September 2014 (UTC)

It should be merīdiem. What the English entries had (until I just now corrected them) was merīdiēm. They never said "merīdīum". —Aɴɢʀ (talk) 09:55, 29 September 2014 (UTC)

The most common binomials in books[edit]

Below are the top 20 most common binomial names to be found in books, found via my original research using the Catalogue of Life and Google ngram data. I'm not sure what our policy is for scientific names, but these are the most commonly found ones, so it seems some care should be taken to give them complete entries with etymologies (which several already have, but also almost half are red links). Hope this is useful for editors.

  1. Homo sapiens
  2. Escherichia coli - E. coli
  3. Staphylococcus aureus (8 occurrences in wikt defs., 4 linked, 2 taxlinked) - Staphylococcus - staphylococcus - staph
  4. Candida albicans (4, 2 linked)
  5. Pseudomonas aeruginosa (7, 4 linked, 0) - Pseudomonas - pseudomonas
  6. Mycobacterium tuberculosis (5, 3 linked, 3) - Mycobacterium - mycobacterium
  7. Saccharomyces cerevisiae (8, 7 linked)
  8. Drosophila melanogaster (10, 7 linked)
  9. Zea mays (21, 16 linked)
  10. Bacillus subtilis (8, 6 linked)

11. Haemophilus influenzae (2, 2 linked, 1) - Haemophilus - influenzae
12. Pneumocystis carinii (2, 2 linked, 0) - Pneumocystis - pneumocystis
13. Salmonella typhimurium (2, 1 linked, 0) - Salmonella - salmonella
14. Treponema pallidum (4, 3 linked)
15. Streptococcus pneumoniae (2, 2 linked, 0) - Streptococcus - streptococcus - strep
16. Phaseolus vulgaris (20, 16 liked)
17. Clostridium botulinum (5, 5 linked)
18. Listeria monocytogenes (2, 2 linked, 0) - Listeria - listeria
19. Klebsiella pneumoniae (6, 4 linked)
20. Xenopus laevis - Xenopus (1, 1 linked)

Pengo (talk) 10:10, 29 September 2014 (UTC)

Thanks. We have only some of the generic names for the redlinks. Bacteria are definitely not well covered, partially because there are rarely vernacular names for them and they therefore aren't "requested" by use in an entry. DCDuring TALK 12:54, 29 September 2014 (UTC)
If you think about it, binomials would be more likely to show up on this list if they didn't have common names to compete with them in usage. Chuck Entz (talk) 13:29, 29 September 2014 (UTC)
Yes. And the strong interest in disease-causing organisms among researchers, clinicians, and the public accounts for 14 of the list members. I guess that a way of measuring the "demand" for these would be to add the organism name to the name of the disease caused in each language for which we have an entry for the disease and to request translations for English disease words. DCDuring TALK 14:32, 29 September 2014 (UTC)
And 5 on the list are model organisms that would appear in a vast number of scholarly publications, including the beer yeast. That leaves us with Homo sapiens. DCDuring TALK 14:43, 29 September 2014 (UTC)
A lot of the "generic names" for bacteria are just the species name written lower case. —Aɴɢʀ (talk) 14:49, 29 September 2014 (UTC)
Those are the really common ones.
w:Model organism and the pages linked at w:List of sequenced eukaryotic genomes#See also contain a good number of potential entries which would have similar usage. DCDuring TALK 14:56, 29 September 2014 (UTC)
When I get a chance, I might try making a lists of vertebrates and plants found in the fiction corpus to try and get a less research-centric list. Pengo (talk) 23:04, 29 September 2014 (UTC)
I don't object to what you've done: I welcome it. I don't certainly consider the research bias a weakness, but it is a characteristic of the methodology. I don't hope for much from a fiction corpus.
In my discussion of your list I was trying to understand how your approach differed from the approaches I and others had been taking and specifically why our approaches missed a good number of the specific items in the top 20. (I haven't even looked at the longer list.)
The approaches that have been used, some only sporadically, are:
  1. Top-down filling in of the tree of life, adding hyponyms at each level. (This becomes quite unwieldy sometimes at the genus level, sometimes at the species or lower level. It also leads to a possible overemphasis on extinct taxa and on the proliferating population of clades not used outside of systematic taxonomy.)
  2. Bottom-up filling in of the tree of life, adding hypernyms. (The number of additions declines because so many lower taxa share hypernyms/ancestors.)
  3. Adding items of interest to the contributor, often by type of flora or fauna (eg, birds, spiders, types of mammals: felines, canines, murines, bovines, marine, etc) or based on national or local lists of flora and fauna (most notably Finland), (This is a good fit with our wikiness, but makes for very spotty coverage.)
  4. Adding lists of flora and fauna neglected in other sources (eg, liverworts) (Something of a dead end in this case.)
  5. Adding templates to any taxonomic names already in Wiktionary to determine the "demand" for taxonomic and lately English vernacular names and adding the most common ones. (Limited so far by a lack of automation of the template-adding process, which should perhaps be replaced by counting the number of occurrences in en.wikt's dump of taxa names occurring as headwords in en.wikt, WP, and Wikispecies.)
  6. Adding items from topical lists such as for endangered species, sequenced genomes. (Small numbers of items)
  7. Adding items from WP dab pages for English vernacular names. (Limited use so far, but could become systematic)
Other approaches not yet tried:
  1. Add L2 sections or definitions for all the vernacular names in all languages contained in Wikispecies
  2. Add entries or definitions from available downloadable databases, such as for viruses and birds.
  3. Follow the approach taken at Swedish WP: having webcrawlers gather material for a good stub for such articles.
I favor approach 5 for systematic additions at this stage, but practice 1 and 2 as part of that effort. I also indulge in 3 and 6. If we were to shift to automatic mass addition of entries, I would shift my efforts to making sure that we were linking to external databases as automatically as possible, reviewing such entries, and improving existing entry quality. DCDuring TALK 00:22, 30 September 2014 (UTC)
I'm not sure I understand your #5 point? What kind of templates? Do you mean taxlink templates? and how do they help measure "demand"?
The main thing I've focused on with my approach is ranking the data. I figure if we started at Aa achalensis and work our way down through 1.5 million species then we'll take a long time to create entries for any of even the most common searches (I know that's not actually an approach you've listed, but it's the alternative I have in my head). My goal is to have definitions and etymologies for the most popular taxa, especially around the species level (genera, epithets, binomials), so that nature enthusiasts and students of biology can understand their meanings better and be less discouraged by the Latin terminology they encounter, perhaps even referring to Wiktionary one day when it comes to naming a new species. I've tried a few other approaches to ranking, such as using Wikipedia's hit counting (although I can't find much of the list I created from it except the short list here), and of course simply counting the most popular epithets in a big list of species. I'd like to try using Google Trends. I don't think Google's API allow doing 1.5 million queries, but perhaps it would be possible just to re-rank the 52,000 species found in books. This might be the best way to discover the scientific names which the broader population are actually searching for.
The other audience for my lists is Wiktionary's editors. I still have little proficiency with Latin. (The first time I posted a list of common epithets, I was actually surprised to discover, after seeing new entries created, that most of them were not terms specific to modern biology but were simply ordinary Latin words). So I largely rely on editors to do the heavy lifting of creating new entries. And that's something that has that has occurred to me again after reading your post: although I've been editing wikis for over a decade, I don't really have any idea what editors here would actually prefer to be editing, or what their process is, or what motivates editors to do what they do, or what kind of lists they'd like to see. I've basically just made lists and posted them, hoping that they'll list things worth including in Wiktionary and that editors might be interested in creating the entries, and fortunately it's generally worked out well. Although I have plenty of ideas for how to improve the lists or for how other lists could be made, I haven't gotten a lot of broad feedback on what editors would actually like to be editing or creating entries for, or how their process works, or what information would be most useful, whether editors would prefer to focus on one type of entry at a time (e.g. words ending in -ceps), or a bunch of things with a general theme? or how information should be grouped (would it be a big help if masculine, feminine, and neuter forms were listed together or not make much difference?), what trips up editors or slows them down? or what kind of decisions are editors making when looking through a list? —Pengo (talk) 11:19, 30 September 2014 (UTC)
@Pengo: re: 5 above. I have a little perl script that counts occurrence of {{taxlink}}. I will soon modify it to do the same with {{vern}}. I had originally thought that the categorization would be good for generating lists, but, as the list is of entries, not taxa, it isn't. As a result I run the script and add the most common items on the resulting list each time. A more ambitious approach would be to count the occurrences in Wiktionary of words in lists of specific epithets or of entire taxa. It is necessary to count unlinked terms because so many taxonomic names in entries are not linked, for reasons that can't reflect any user considerations. Probably the contributors disliked the redlinks and thought that taxon entries, especially for binomens and trinomens would never be created. This thought reflected expressed opinions of senior contributors. Many such entries don't even have links for the taxa to WP (in any language) or to Wikispecies.
By 'demand' I only mean use on Wiktionary, which reflects some kind of blend of how many language have one or more vernacular names for the taxon and whether wiktionarians have any interest in either the taxa or the vernacular names. Over time, taxa appearing as Hyponyms or, especially, Hypernyms in Translingual sections have come to be well-represented despite being uncommon except in the literature of systematics.
The list of specific epithets used in the most species names, as useful as it is, does not well correspond to the list of those specific epithets actually used but missing on Wiktionary. Due to a lack of consensus about whether some specific epithets not occurring in Classical Latin were better treated as Latin or as Translingual I use {{epinew}} to link and categorize epithets by the language we choose for them. This is supposed to link the the lemma entry and display the actual term. It sorts the item by the lemma so it is easy to use Category:Species entry using missing Latin specific epithet to find missing specific epithets with multiple occurrences. Adding {{epinew}} to the existing species entries that don't have it is tedious.
I have thought it a little embarrassing for us to have so little knowledge about what users seek. It should be even more embarrassing that we are unwilling to characterize what we like to work on. Speaking for myself, I have liked:
  1. cleanup lists, both one-time lists unlikely to need to be recreate and those that are constantly renewed, often by user error or ignorance. Such lists can be long if the effort required per entry is modest.
  2. relatively short lists of items that IMO need a lot of work, so that I have the satisfaction of emptying them.
  3. individual requests, because I know someone is interested.
  4. variety in my areas of interest and self-perceived responsibility. I have browser tabs open to several lists in those areas (whether categories, search results, or user-created).
I have always been motivated to correct what I see as problems, so lists with such focus are particularly motivating. I now try harder to avoid areas of controversy, unless in an area I feel especially responsible for.
I expect that my preferences are not unique, but also not universal. DCDuring TALK 14:05, 30 September 2014 (UTC)

There is no British English <-isation> though the US version <-ization> exists.[edit]

In Wiktionary there is no entry of the frequent British English <-isation> though the US version <-ization> exists. Both the entry and cross references are required. GHibbs (talk) 13:38, 29 September 2014 (UTC)

Huh? We have an entry -isation, and have had for quite some time. —Aɴɢʀ (talk) 13:48, 29 September 2014 (UTC)

requirements for getting rollback permission?[edit]

Special:ListGroupRights shows that the rollback permission can be granted to users without having to make them an admin. I searched the help files, but there is no mention of this practice. So I'm just wondering: what are the requirements of getting the rollback permission on Wiktionary? --Ixfd64 (talk) 19:35, 29 September 2014 (UTC)

I was going to apply for rollback and patrolling rights. You can be nominated by an admin on WT:WL but there's no procedure for nominating yourself (AFAIK). Renard Migrant (talk) 11:05, 3 October 2014 (UTC)
No, WT:WL is only for the autopatrolled flag. If you want rollback, you can just ask here. The only requirement is that you have to convince The Powers That Be that you can do the job well. For me personally, that means you already have the autopatrolled flag, you know WT:CFI and WT:ELE like the back of your hand, and there are no red flags of potential trouble (drama-queening, etc.). (Would be nice to get acquainted with the admins, too.)
User:Ixfd64, your contributions here have been rather sparse lately, and I have not really seen you around in the "(anti-)social" side of the project (so to speak), but otherwise I see no reason not to grant you these flags. Just ask. User:Renard Migrant, I think you know policies well, but I have mixed feelings about letting you deal with newbies given your, shall we say, brutal honesty. We already have a few too many arseholes in power here. Keφr 14:24, 7 October 2014 (UTC)
How is the rollback tool different from just clicking "undo" on an edit or series of edits and then saving the page in its reverted form? — I.S.M.E.T.A. 15:07, 7 October 2014 (UTC)
It only takes one click and automatically generates an edit summary. --WikiTiki89 16:17, 7 October 2014 (UTC)
WT:WL has been used for rollbacker nominations before. — Ungoliant (falai) 16:26, 7 October 2014 (UTC)
Yeah, I haven't been that active on Wiktionary in recent years. I used to regularly create wanted entries, but Wiktionary has since become pretty mature, and most of the wanted entries nowadays are foreign words that I'm not familiar with. So I spend much of the time doing RC patrols now. --Ixfd64 (talk) 17:21, 7 October 2014 (UTC)

Category tree[edit]

How does one create a new category these days? Do we have a page with instructions? Ƿidsiþ 07:31, 1 October 2014 (UTC)

The modules are not quite finished yet (at least not how I would like them to be) but I suppose I could write some documentation in the meantime. If you look on Module:category tree, there are various subpages for different parts of the tree. Some are modules containing code, while others are data modules where the categories themselves are specified. —CodeCat 12:35, 1 October 2014 (UTC)
So one hasn't been able to readily add a conforming category for how long now? DCDuring TALK 12:48, 1 October 2014 (UTC)
One can create a category the old-fashioned way and let others bring it into conformity later. DCDuring TALK 12:50, 1 October 2014 (UTC)