Wiktionary:Beer parlour/2014/October: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
Line 374: Line 374:
* {{reply to|Icarot}} I can help you generate stubs for Russian nouns, adjectives, verbs and adverbs (the rest are a closed category and mostly covered). Stubs would be entries like in [[:Category:Russian entries needing definition|this]] category - the only thing they are missing are definitions. I could help extract a list of missing lemmas from a particular work. We could also pregenerate a list of examples for every entry and format them using the {{temp|usex}} template, by taking them from ru Wiktionary, glosbe, parallel corpora databases, subtitles, google translate and so on, that editors could easily copy/paste into entries that are missing them. Don't worry about associations (derived terms, *nyms, morphological etymologies etc.) - those can be largely automated once entries with definitions are created. The primary focus should be on coverage. --[[User:Ivan Štambuk|Ivan Štambuk]] ([[User talk:Ivan Štambuk|talk]]) 07:32, 19 October 2014 (UTC)
* {{reply to|Icarot}} I can help you generate stubs for Russian nouns, adjectives, verbs and adverbs (the rest are a closed category and mostly covered). Stubs would be entries like in [[:Category:Russian entries needing definition|this]] category - the only thing they are missing are definitions. I could help extract a list of missing lemmas from a particular work. We could also pregenerate a list of examples for every entry and format them using the {{temp|usex}} template, by taking them from ru Wiktionary, glosbe, parallel corpora databases, subtitles, google translate and so on, that editors could easily copy/paste into entries that are missing them. Don't worry about associations (derived terms, *nyms, morphological etymologies etc.) - those can be largely automated once entries with definitions are created. The primary focus should be on coverage. --[[User:Ivan Štambuk|Ivan Štambuk]] ([[User talk:Ivan Štambuk|talk]]) 07:32, 19 October 2014 (UTC)
** Not sure why you are not continuing with this crap in the Serbo-Croatian Wiktionary. It already has more than 100 000 Serbo-Croatian definitionless entries. If Wiktionary users are so hungry after such content as you posit, Serbo-Croatian Wiktionary could become one of the most visited Wiktionaries soon. Unless it gets shut down due to copyright violation, that is, such as because of automated lifting of data from Google translate as you seem to suggest above. --[[User:Dan Polansky|Dan Polansky]] ([[User talk:Dan Polansky|talk]]) 07:58, 19 October 2014 (UTC)
** Not sure why you are not continuing with this crap in the Serbo-Croatian Wiktionary. It already has more than 100 000 Serbo-Croatian definitionless entries. If Wiktionary users are so hungry after such content as you posit, Serbo-Croatian Wiktionary could become one of the most visited Wiktionaries soon. Unless it gets shut down due to copyright violation, that is, such as because of automated lifting of data from Google translate as you seem to suggest above. --[[User:Dan Polansky|Dan Polansky]] ([[User talk:Dan Polansky|talk]]) 07:58, 19 October 2014 (UTC)
**: Inflections cannot be copyrighted, the databanks such as HJP are completely free. Besides, I fixed many errors in them, and used two others as well. Definitions on the other hand can be copyrighted, and are nevertheless abundantly stolen by many FL Wiktionaries without anyone so much raising an eyebrow. Don't worry Polansky, soon I'll add many such stubs for Czech as well. --[[User:Ivan Štambuk|Ivan Štambuk]] ([[User talk:Ivan Štambuk|talk]]) 08:07, 19 October 2014 (UTC)


== IPA, language code and error message ==
== IPA, language code and error message ==

Revision as of 08:07, 19 October 2014

What makes a single word idiomatic?

I think it would be nice if we took WT:CFI a bit more seriously. I mean, de facto there's no problem because nobody's forcing us to apply our own rules; there's no 'court of appeal' if there's a deletion decision that goes against WT:CFI. Anyway.

Under General rule:

"A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic."

Under Idiomaticity:

"An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components."

So, all terms have to be idiomatic (well, it's a 'somewhat more formal guideline'. Viewed from that perspective, it does make it sound like attested and idiomatic aren't in the rules, they're just in the guidelines!), but in terms of CFI, it only give guidelines on what idiomatic means from an expression. Given that all terms have to be idiomatic, what's the test for say, hat, or reenter?

I know it's hard work, but I just think it would be nice if we could take ourselves a bit more seriously. Renard Migrant (talk) 11:14, 4 October 2014 (UTC)[reply]

For hat it's obvious because its meaning cannot be easily derived from its phonemes /h/, /æ/, /t/, since phonemes do not have any meaning to convey. For reenter it's less obvious because its meaning can be derived from the meaning of re- and the meaning of enter, but we seem to have an (unwritten?) agreement here that everything written together without a space is eligible for an entry. That convention breaks down, however, for languages that are not usually written with spaces; and it has been controversial for polysynthetic languages that may write whole phrases like "he had had in his possession a bunchberry plant" as one word without spaces. For English, the only real ambiguity is in expressions that are written with spaces, because there is no unambiguous criterion to distinguish idiomatic ones from unidiomatic ones. Probably everyone agrees that hot dog is idiomatic and hot lightbulb isn't, but between those two extremes there's a continuum, not a clearly defined split. —Aɴɢʀ (talk) 12:53, 4 October 2014 (UTC)[reply]

Make Categories Show Where They're Defined

I would like to propose that the category templates be modified to show the name of the data (sub) module where the information for the category resides. This would make it easier to make changes, and also make it easier to figure out where a new category analogous to existing ones could be added.

Adding documentation pages to modules is helpful, but it still takes a bit of wandering the maze of modules and sub-modules and data sub-sub-modules to figure out where category information resides. This shouldn't be too hard, since the modules have to have this information at some point- it's just a matter of developing protocols for passing it back to the templates.

It might also be nice to give instructions on where to go to get changes made, but that may not even be settled yet. This is all part of a larger problem with our newer Lua-based architecture, which is that things are centralized in data modules and impossible for non-admins to access, but I'll leave that for a separate topic. Chuck Entz (talk) 16:43, 4 October 2014 (UTC)[reply]

All the categories have an "edit" button already, and it's been there for a few years maybe. You never noticed? —CodeCat 16:47, 4 October 2014 (UTC)[reply]
Why are you so surprised? It's not what one one expect from how many other things work. Human attention works that way. Given that the question of category documentation and editing has been asked before without answer, Chuck probably assumed that it must be a policy matter. DCDuring TALK 18:47, 4 October 2014 (UTC)[reply]
And, once the edit button is clicked on, then what? DCDuring TALK 18:51, 4 October 2014 (UTC)[reply]
I wrote three paragraphs. You never read them?
If you click on Edit for Category:English colloquialisms, you get:
  1. {{poscatboiler|en|colloquialisms}}. poscatboiler contains:
  2. {{#invoke:category tree|show|template=poscatboiler|code={{{1|}}}|label={{{2|}}}|sc={{{sc|}}}}}, so we go to that module.
  3. Module:category tree refers us to:
  4. Special:PrefixIndex/Module:category tree. The logical next step is:
  5. Module:category tree/poscatboiler. This refers us to:
  6. Module:category tree/poscatboiler/data, which refers us to:
  7. Special:PrefixIndex/Module:category tree/poscatboiler/data, which contains dozens of submodules. Fortunately, I've been working with categories long enough to spot:
  8. Module:category tree/poscatboiler/data/terms by usage as the most likely choice.
And there it indeed is. What I'm proposing is a line at Category:English colloquialisms that refers you to Module:category tree/poscatboiler/data/terms by usage without your having to going through all the steps above. I've worked a lot with categories, and I know something about templates and modules, and there are times when I have to look at several data sub-sub-modules before I can find where the configuration is for a given category. Sure- it's simple! Chuck Entz (talk) 18:58, 4 October 2014 (UTC)[reply]
CodeCat was referring to the small edit button next to the text. You were referring to the edit tab at the top, which is the first place one would look to edit something other than a section. Someone introduced a non-standard positioning of the edit option and expected it to "of course" be noticed by anyone with half a brain. But that is simply not true: habits that are reinforced by thousands of successful repetitions are not easily overcome and cause attentional blindness to such things as small edit buttons in unexpected places. DCDuring TALK 19:11, 4 October 2014 (UTC)[reply]
Ah, that explains it! No, I never noticed it. I was wondering how she could have so completely missed my point. That feature does, indeed, make my proposal rather redundant- but it might still be useful for those who are trying to figure out how the categories work, but aren't going to be editing data modules. Perhaps a combination would be a good idea, such as "This category is defined at Module:category tree/poscatboiler/data/terms by usage" with the edit link at the end. Chuck Entz (talk)
That would be a bit too long to fit where the edit button currently is. Do you know where else it could be placed? —CodeCat 19:52, 4 October 2014 (UTC)[reply]
How is what happens after one clicks the edit link self-explanatory? Some kind of help (colored green?) to click on next to the edit button would both make the edit button more visible and afford an opportunity to explain further. DCDuring TALK 20:42, 4 October 2014 (UTC)[reply]
Is new to me too. Here's some ideas for making it more visible:
  1. Add a hidden category to the category pages, e.g. Category:Categories defined by Module:category tree/terms by usage. (And that category can then explain more in its description, and link more obviously to the module). Editors are more likely to have hidden categories showing, so may notice.
  2. Change the text to something more descriptive, such as "[Edit category definition]", and/or perhaps an even more wordy hover text, e.g. "Edit the module which defines this category's description, category parent, and category text."
  3. Add an item to the left nav under "Tools". (Though that would probably be even less noticed)
Also, pages like Module:category_tree/poscatboiler/data/terms_by_usage could really use some docs to say what is and isn't safe to edit, how to propose or add new categories, and how to test that your edits aren't going to break everything. (Especially as it [Edit] buttons encourage users to edit it). Even if you know Lua and something about Wiktionary, you still don't know what can be edited safely on that page.
Perhaps a whole other conversation, but the docs on each category page really should say (or link to) how a regular editor can add a page to that category, e.g. which template or group of templates are used in the article space to add the category tag and whether it needs additional parameters to cause it to be added, etc. Though that's a whole other conversation and perhaps a thankless task to document properly. Pengo (talk) 11:52, 5 October 2014 (UTC)[reply]
How about in an editnotice? --Yair rand (talk) 14:13, 5 October 2014 (UTC)[reply]

Time-waster

Considering that so much time has been wasted on rfv's/rfd's due to misspellings (especially in hyphenation) resulting from scannos, should we expand our criteria for inclusion page with notifications/warnings or something? Just a suggestion. Zeggazo (talk) 20:15, 4 October 2014 (UTC)[reply]

First of all, shouldn't these be "definite nouns", not "definitive nouns"? Second of all,four of the five entries in this category are simply the definite equivalents of Arabic lemma nouns (which are always in the indefinite). The definition itself specifies this. The definite equivalents are formed simply by appending "al-" (or rather, the Arabic equivalent) to the noun. I thought there was a policy not to include such forms unless they have an idiomatic definition? I'm going to add {{delete}} tags soon but I want to make sure others don't disagree.

BTW the fifth of five entries is the word العَرَبِيَّة (al-ʕarabiyya), which has a special meaning ("the Arabic language"), separate from the word عَرَبِيَّة ("carriage" or "female Arab"), so it should be kept. Benwing (talk) 08:38, 5 October 2014 (UTC)[reply]

Perhaps nouns and proper nouns where ال (al-) is always used should still be categorised as "definite nouns"? It's useful for readers to know that a term is formed by al- + the stem. Not sure if ALL such terms should be redirected to terms without the definite article. --Anatoli T. (обсудить/вклад) 23:36, 5 October 2014 (UTC)[reply]
OK, So no one answered my question. For the ones that are simply the definite equivalents of existing lemmas, with no special meaning, should I delete them, or keep them and use something like {{definite of}}? I think we should delete, since otherwise we're setting a precedent for creating definite equivalents of every single noun out there, which is crazy, since they're all formed trivially in exactly the same fashion by just adding "al-" (actually ال (al-), in the Arabic script) onto the beginning of the noun. It would be comparable to creating entries for the car and the boat and the kumquat, etc. etc. Any objections to me deleting them? Benwing (talk) 10:31, 8 October 2014 (UTC)[reply]
Normally terms should be RFD'ed for deletion but since they are definitely just "definite article + noun" entries, yes, delete all, except العربية and اللغة العربية. If you don't have the rights to delete, I'll delete them for you. العربية and اللغة العربية should probably be RF-ed or RFV-ed, not sure. --Anatoli T. (обсудить/вклад) 22:29, 8 October 2014 (UTC)[reply]
I would also keep الأمين as one of the names of Muhammad and also given name after that. --WikiTiki89 22:49, 8 October 2014 (UTC)[reply]
Also, {{ar-proper noun}} should automatically add to Category:Arabic definite nouns. --WikiTiki89 22:52, 8 October 2014 (UTC)[reply]
Yes, keep الأمين. Agree about proper nouns as well. --Anatoli T. (обсудить/вклад) 22:59, 8 October 2014 (UTC)[reply]
Definite forms in Arabic are not written with a separating space, as far as I know, so they closely parallel the definite forms of the Scandinavian languages. Since we have separate entries for those (dag, dagen, dagar, dagarna), we should probably also have separate entries for the definite forms of Arabic nouns. —CodeCat 23:20, 8 October 2014 (UTC)[reply]
Arabic grammar doesn't consider definite articles part of the word. Exceptions are proper nouns. Also, monosyllabic prepositions consisting of one consonant and a short (unwritten) vowel are spelled together, they are separate words, unless they are adverbs (debatable), e.g. بِسُرْعَة (bisurʕa) -quickly (lit.: "with speed"), preposition بِ (bi-) + سُرْعَة (surʕa) (speed), enclitic pronouns بَيْتِي (baytī) "my house", بَيْت (bayt) + my - "ي" (-ī). Scandinavian, Bulgarian/Macedonian, Albanian definite forms are also debatable but they should be considered separately. Korean particles and copulas are also written without a space but they are considered separate words. 도서관 (doseogwane) "to the library" = 도서관 + 에. --Anatoli T. (обсудить/вклад) 23:43, 8 October 2014 (UTC)[reply]
Arabic, Hebrew, and Aramaic have a lot of clitics and we have a consensus generally not to include words with clitics. The definite article is arguably one of these clitics, although in Aramaic the definite form is actually the lemma form. However, we do seem to have a status quo of generally not including the definite forms for Arabic and Hebrew. --WikiTiki89 02:52, 9 October 2014 (UTC)[reply]

The Latin word com has no entry.

The Latin word com, a component of commodus does not have an entry. GHibbs (talk) 08:06, 6 October 2014 (UTC)[reply]

Is it ever a free-standing word? As a prefix we have com- (and con-, col-, cor-, and co-). DCDuring TALK 10:39, 6 October 2014 (UTC)[reply]
The free-standing word corresponding to com- is cum. —Aɴɢʀ (talk) 16:37, 6 October 2014 (UTC)[reply]

Transliterations for headword-line inflections

Previous discussion: Wiktionary:Beer parlour/2013/October#Transliterations for inflected forms in headwords?

This was discussed before a while ago, but didn't reach much of a conclusion. The question is how to deal with transliterations of inflected forms that are displayed in headwords. Module:headword, and by extension many of our current headword-line templates, do not support this at all. But for Arabic we've always displayed transliterations for inflected forms, and the templates therefore had to be custom-made to handle this.

I imagine it's best to have a single common behaviour for all languages. So the question is, should we include them for all languages, for none, or for some subset? And if only for some subset, then based on what criteria? —CodeCat 16:08, 7 October 2014 (UTC)[reply]

  • My 2p is on all. As the EN WT, our user base can be assumed to read English. If an entry is in a non-Latin script, we cannot assume that our users can read the headword, and as such, for the sake of usability (among other factors), we should provide transcriptions. ‑‑ Eiríkr Útlendi │ Tala við mig 17:26, 7 October 2014 (UTC)[reply]
I thought that our "ground rules" said that all non-Roman texts should (eventually) be transliterated - and that this could be by means of "pop-up" text if necessary or wanted. — Saltmarshαπάντηση 17:44, 7 October 2014 (UTC)[reply]
Transliterate all. --Vahag (talk) 18:42, 7 October 2014 (UTC)[reply]
Don't transliterate Russian inflected forms or some other languages having irregular pronunciations. It may also look quite messy if there are a lot of forms in the header. Arabic editors want to transliterate all, so be it. I don't object Arabic transliterations. --Anatoli T. (обсудить/вклад) 22:36, 7 October 2014 (UTC)[reply]
I'm not sure I understand your reasoning. If I understand correctly that by "irregular pronunciation" you mean "pronunciation not fully predictable from spelling", then it seems to me that those cases are exactly the ones where a transliteration would be useful. Then again, we've already established that many editors here disagree with the practice of using pronunciation as a guide to transliteration in phonemic scripts such as Cyrillic. —CodeCat 22:49, 7 October 2014 (UTC)[reply]
I agree with Atitarev that we should transliterate inflected forms only for languages for which the transliteration is essential to understand the structure of the inflected form. For languages such as Arabic, for which transliterations could be considered superfluous when the words are fully vowelated, there is another consideration: It may be difficult for some readers to see the vowel diacritics, making the transliterations essential to these readers. For languages like Persian, for which we do not indicate vowels at all in the native script, transliterations are absolutely essential. --WikiTiki89 22:47, 7 October 2014 (UTC)[reply]
What about users who want to know what is written, but are not learned in reading it? Arabic looks like nonsensical squiggles to me, and without transliterations the forms might as well not be there at all. For Cyrillic or Greek the consideration is no different, except that I just happen to be able to read those scripts. But there will of course be many users that can't. —CodeCat 22:51, 7 October 2014 (UTC)[reply]
Someone who cannot read a language is unlikely to need to know how a word inflects. --WikiTiki89 00:04, 8 October 2014 (UTC)[reply]
@Wikitiki89 I guess I'm unlikely then? —CodeCat 00:33, 8 October 2014 (UTC)[reply]
Yes, you are one of the few. Keep in mind that our inflection tables usually do have transliterations. But if you are interested enough in Arabic, I suggest you learn the alphabet. Otherwise you would be comparable to someone wanting to learn chemistry without learning the chemical element symbols or someone wanting to learn calculus without learning mathematical notation. --WikiTiki89 11:30, 8 October 2014 (UTC)[reply]
  • Does adding a romanization to inflected forms harm the project in any way? It seems to me instead that it would add value. Perhaps I happened across the term (deprecated template usage) რეჰანმა (rehanma) and simply wanted to know roughly how to read it, without any knowledge of the Mkhedruli script. Thankfully, this entry for an inflected form already includes a romanized spelling. Would you advocate for removing romanizations from inflected forms? If so, why? ‑‑ Eiríkr Útlendi │ Tala við mig 05:29, 8 October 2014 (UTC)[reply]
@CodeCat, Many editors doesn't mean there's a consensus. If you haven't noticed there are a lot of languages with irregular pronunciations and transliterations (exceptions). There's no practice in published dictionaries to transliterate Russian or Greek, hence an in-house (Witktionary) transliteration method is used. "narodnovo" and "narodnogo" are equally attestable transliteration of genitive form of наро́дный (naródnyj) - наро́дного (naródnovo). Japanese and Korean exceptions are partially handled by smart modules (some Korean exceptions still need to be transliterated manually, such as 십육) but Russian is not, こんにちは is "konnichi wa", not "konnichi ha". Do I need to bring up that argument again? Hindi, Thai, Lao, Greek also have irregularities, which are reflected in standard or Wiktionary transliterations. Automatic transliteration would cause, e.g. ру́сского appear as "rússkogo", which should be "rússkovo" (gen. of русский) --Anatoli T. (обсудить/вклад) 23:03, 7 October 2014 (UTC)[reply]
Cyrillic, Greek, Armenian, Georgian vs Hangeul, Arabic, Hebrew, Thai, Devanagari, etc. The former are considered "easy" by dictionary publishers, although Devanagari is very phonetic. Since dictionaries usually don't use transliterations for the former, we have this argument that those should reflect the spelling, letter-by-letter whereas the difficult ones use phonetic transliterations or transcriptions, mixture of literal and phonetic. You can learn about transliterations for complex scripts and see that they are full of exceptions, most are documented ("standard" or "scientific"). --Anatoli T. (обсудить/вклад) 23:13, 7 October 2014 (UTC)[reply]
  • Reading the above, I think it would be useful for us to be clear about transcription -- changing one script for another, such as “ру́сского” → “rússkogo” -- versus transliteration -- which would include phonetic considerations, such as “ру́сского” → “rússkovo”.
Anatoli, do you (or any others) have any objection to transliteration? ‑‑ Eiríkr Útlendi │ Tala við mig 23:29, 7 October 2014 (UTC)[reply]
@Eirikr You seem to have gotten transcription and transliteration backwards. Transcriptions are phonetic while transliterations are (supposed to be) graphemic. --WikiTiki89 00:04, 8 October 2014 (UTC)[reply]
@Eirikr, have you read all of my posts above? Would agree to transliterate こんにちは as "konnichi ha" and 십육 as "sibyuk"? Modern standard transliterations go far beyond just representing words simply letter-by-letter. They use a lot of phonetic considerations, call them transcriptions, if you wish but they are not. "rússkovo" is not 100% phonetic, only shows irregular pronunciation of "г", it's pronounced [ˈruskəvə] (the phonetic respelling is "ру́скава"). --Anatoli T. (обсудить/вклад) 23:37, 7 October 2014 (UTC)[reply]
BTW, fully automated Arabic transliteration will affect irregular Arabic words, such as إنْجِلِيزِيٌّ (ʔinjilīziyyun), which is pronounced the "Egyptian" way - "ʾingilīziyyun" and other loanwords and dialectal pronunciations. It's probably fine, just need to be aware of this. --Anatoli T. (обсудить/вклад) 23:46, 7 October 2014 (UTC)[reply]
Just to make sure, you realise that if we do have transliterations for inflections on headword lines, there will also be parameters on {{head}} to override any default ones? —CodeCat 23:49, 7 October 2014 (UTC)[reply]
I suspected there would and should be but the task is too big. All adjective-like nouns will be affected first (-ого, -его/-ёго genitive endings), all words where (Cyrillic) "е" is pronounced as "э" (the largest group of exceptions). --Anatoli T. (обсудить/вклад) 23:55, 7 October 2014 (UTC)[reply]
I oppose the addition of romanizations on inflected forms for two reasons (for Russian) - 1. The irregular words will need to be transliterated manually or might introduce errors. 2. The headwords get cluttered. (genitive sg., nom. plural, feminine form - are the possible inflected forms for Russian). It doesn't have to be for all languages like that. --Anatoli T. (обсудить/вклад) 05:34, 8 October 2014 (UTC)[reply]
  • Your mention of "clutter" led me to look into Russian entry format. Here's a sample headword line from the entry for русский:

ру́сский (rússkijm anim, m inan (genitive русского, nominative plural ру́сские, feminine ру́сская)

This looks like a bit of a mess to me; all of the additional headword information for inflected forms is already given, as expected, in an Inflected forms table contained within the entry.
Redundancy aside, I think (deprecated template usage) русский (russkij) is already fine -- there's a romanization of the headword, and the Inflected forms table provides romanizations of all other forms.
My current understanding of general policy, and this proposal, is that we want to make sure that all entries in non-Latin scripts include romanizations. So I'm really not worried so much about the lack of romanization for the link to (deprecated template usage) русская (russkaja) in the headword line for the (deprecated template usage) русский (russkij) entry. (For that matter, I think the headword line should be simplified to remove the redundant and visually cluttered inflected forms, but that might just be me.) I'm more concerned about whether there is any romanization given in the actual entries for inflected forms. Gladly, (deprecated template usage) русская (russkaja) does provide a romanization.
Would you be amenable to ensuring that all entries have romanizations? ‑‑ Eiríkr Útlendi │ Tala við mig 07:11, 8 October 2014 (UTC)[reply]
I'm going to add my 2 cents to transliterating all inflections in all languages, but I think it's most important for languages like Persian and Arabic where vowels may not be written, and is important for Arabic even when vowels are written because of the difficulty that the average user will have in reading the script. So far it looks like Anatoli is opposed to transliterating inflections for Russian but not Arabic, Wikitiki might be similar, and everyone else is OK with transliterating inflections in all languages. Is this right?
I do think it's possible to make an argument that there's something qualitatively different and more "foreign" about Arabic or Devanagari or Thai vs. Greek or Cyrillic. Certainly this is the case for me. However, keep in mind, Anatoli, that you're a native Russian speaker whereas the majority of users of the English Wikipedia will not be, and might well be trying to learn a foreign language and so care about the inflections, but not be very comfortable with the script.
BTW as for the clutter issue, the same "issue" should theoretically appear in Arabic, but IMO the previous way of doing things (before CodeCat changed it), which did display transliteration of all Arabic inflections, didn't look especially cluttered. The trick here I think is to put the inflections outside of the parens, so that you don't end up with nested parens when you display the transliterations. Benwing (talk) 08:20, 8 October 2014 (UTC)[reply]
I agree that we put too much information on the inflection lines of Russian nouns. There is absolutely no need for the genitive or plural in the headword line, unless the form is irregular. The feminine form is useful, however. If the argument is about showing the stress pattern, then the genitive is needed only for nouns ending in a consonant (or ь). But I still don't see why the declension table isn't enough for this. --WikiTiki89 11:30, 8 October 2014 (UTC)[reply]
Just to clarify my position on Russian headwords. I don't oppose the information (it's helpful, can help quickly identify stress patterns and declension types and plural forms) but I don't think it's a good idea to transliterate inflected forms. --Anatoli T. (обсудить/вклад) 00:33, 9 October 2014 (UTC)[reply]
The genitive only helps identify the stress patter for nouns that end in a consonant, and only the singular stress pattern at that. It is completely useless for nouns that end in consonants, as the singular stress pattern is apparent from the nominative, except for nouns ending in , which may need the accusative (but certainly not the genitive). The nominative plural is insufficient to identify the plural stress pattern. You additionally need one other plural form other than the plural genitive and also the plural genitive in some cases. At that point, there is too much information in the headword line and we already have declension tables with all of this information. --WikiTiki89 03:02, 9 October 2014 (UTC)[reply]
I disagree (please review your post, you have two contradicting statements - the first two sentences, so I don't know what you mean there). There are 6 stress patterns: Appendix:Russian stress patterns - nouns + some nouns that are irregular.
Consonantal endings:
  1. до́ктор - до́ктора - доктора́
  2. ди́ктор - ди́ктора - ди́кторы
Ь or "hissing" sounds:
  1. ле́карь - ле́каря - ле́кари (stress pattern 3 is also acceptable)
  2. сле́сарь - сле́саря - сле́сари/слесаря́ (то́карь is the same)
  3. глуха́рь - глухаря́ - глухари́
  4. врач - врача́ - врачи́
  5. това́рищ - това́рища - това́рищи
Do I need examples for vowel endings? For people mastering the basics of Russian, including native speakers, this info is usually sufficient without looking at the full declension table. --Anatoli T. (обсудить/вклад) 03:30, 9 October 2014 (UTC)[reply]
Maybe you misunderstood my post. For nouns that end in consonants (including ь), I agree that the genitive singular helps determine the stress pattern for the singular. For nouns that end in vowels, the genitive singular is of no help at all, since the stress is always in the same place as in the nominative singular. Furthermore, for nouns that end in , the accusative might have a different stress from the nominative, yet for some reason we do not include it. For the plural, the nominative plural is insufficient to determine the full plural stress pattern. More information is needed as I explained above, and that would completely overwhelm the headword line and defeat the purpose of having inflection tables. --WikiTiki89 03:43, 9 October 2014 (UTC)[reply]
-а nouns are only one portion of nouns, large but not huge. You still need to know that plural and gen. sg for ка́ша is ка́ши, not ка́шы (beginner level) and томоды́ is a form of томода́. Animacy helps determine the accusative. Well, yes, it's not comprehensive but sufficient in MOST cases. Apart from stress patterns, there are other things - колесо́ -колеса́ - колёса, огонёк - огонька́ - огоньки́, и́мя - и́мени - имена́. Knowing that "-а" nouns (NOT ALL VOWELS, just "а"!) are predictable is a blessing but there are too many other declension and stress patterns. I want to reiterate that gen. sg. and pl. nom. forms are sufficient to determine THE FULL STRESS PATTERN (usually). --Anatoli T. (обсудить/вклад) 04:38, 9 October 2014 (UTC)[reply]
Someone who does not know the rules for ы vs и will probably need the full declension table anyway to figure anything out. Can you give me an example of a noun that ends in a vowel (not including ь or й) whose stress pattern for the singular cannot be determined from the nominative? (I don't believe there are such nouns, but if you can prove me wrong, go ahead.) Note that I am all for including the genitive for nouns ending in consonants. As for the plural, the "usually" part is exactly my point. If there are exceptions, then you can't say that the full stress pattern can be "determined", but only "guessed". I noticed other Russian dictionaries tend to include the genitive and/or the dative for the plural in cases where there could be confusion. But the more we include, the more we get back to the question of why isn't the declension table enough? --WikiTiki89 05:26, 9 October 2014 (UTC)[reply]
Haven't I already with колесо, имя, голова, борода (unlike simple one like женщина with stress pattern 1? What about о́блако - о́блака - облака́ ? --Anatoli T. (обсудить/вклад) 05:37, 9 October 2014 (UTC)[reply]
Perhaps you should re-read which forms I am referring to: nominative singular (колесо́, голова́, борода́) and genitive singular (колеса́, головы́, бороды́). Although you did remind me that the n-stems such as имя are possible exceptions; we should definitely include the genitives for them. --WikiTiki89 05:49, 9 October 2014 (UTC)[reply]
And here's a good one for you: with "-а": голова́ - головы́ - го́ловы, борода́ - бороды́ - бо́роды. So it's not absolutely useless, even for this type of nouns. :) --Anatoli T. (обсудить/вклад) 05:01, 9 October 2014 (UTC)[reply]
Umm... Yes it is useless. Unless you're blind, you can see that the genitive singulars you just gave have the same stress as their corresponding nominative singulars. --WikiTiki89 05:26, 9 October 2014 (UTC)[reply]
Hmm, what?! Have you read it carefully? голова is not like most nouns ending in "-а" and stress patterns can be determined not just by genitive sg but gen. sg + nom. pl in combination! See the table again. It's pattern 6, not 1, example given: полоса́ (same pattern as голова and борода). --Anatoli T. (обсудить/вклад) 05:37, 9 October 2014 (UTC)[reply]
Perhaps you should re-read which forms I am referring to. My point is that in these cases, if you have the nominative singular and the nominative plural, then the genitive singular adds no new information (since the singular pattern is determined from the nominative singular and the plural pattern has nothing to do with the genitive singular). --WikiTiki89 05:49, 9 October 2014 (UTC)[reply]
Displaying genitive sg just shows that it's "as expected", treating vowel and consonant endings differently doesn't make much sense. --Anatoli T. (обсудить/вклад) 05:58, 9 October 2014 (UTC)[reply]
Then instead of treating the vowel and consonants differently, let's use this simple rule: if the stress in the genitive is in a different place from the nominative (or if the stem itself is different, such as for день/дня or имя/имени) then we include the genitive, otherwise it is "as expected" and we exclude it to avoid clutter. If the user is still unsure, then they can check the declension table. --WikiTiki89 06:07, 9 October 2014 (UTC)[reply]
The modules are complicated as is. I don't see the need to change the Russian noun headword. The Russian headword style was discussed and agreed on a while ago. Even if genitive is hardly the crucial case, it's an example of a case and shows how nouns may change. --Anatoli T. (обсудить/вклад) 01:32, 10 October 2014 (UTC)[reply]
Who exactly "agreed" on this, just you and CodeCat? I don't think there is anything wrong with using the genitive as opposed to another case, I just don't think we need to include it for every word. --WikiTiki89 11:21, 10 October 2014 (UTC)[reply]
Right, I too favour not including inflected forms in Russian headword lines, but practices for Russian are usually determined by a minority here. Refer to the transliteration debate. --Vahag (talk) 12:46, 10 October 2014 (UTC)[reply]
If transliteration for the headwords is chosen I'd favour removing inflected forms from the Russian headword altogether. That way, there won't be any additional reasons for arguments, introduced discrepancies with the existing transliteration practice. @Wiki, having genitive in some terms and not the others will be confusing. Also, if you don't like something, don't do it. You're under no obligation to edit in Russian and genitive sg. and plural forms are optional. I've added manually genitives and plurals on many entries, CodeCat did it with a bot and did the headword changes, no conspiracy here. @Vahagn, you can direct your anger at all other languages where transliteration is not 100% graphemic. Transliterating English into Armenian or Russian graghically wouldn't be very useful, would it?--Anatoli T. (обсудить/вклад) 13:22, 10 October 2014 (UTC)[reply]
The question isn't about whether the transliteration is graphic, but whether it represents the written expression of the word rather than the spoken one. For example, a reasonable Cyrillization of English that aims to represent the written language would transliterate colonel as колонел rather than as кёрнел, but bite would still be байт rather than the silly бите. --WikiTiki89 14:10, 10 October 2014 (UTC)[reply]

Consensus on transliteration of headword inflections?

Irrespective of the question of how much info to include in Russian headwords, can I propose a consensus around the following?
  1. For Cyrillic (and maybe also Greek), don't include transliterations of inflections in headword lines.
  2. For other non-Latin scripts, do so. This info comes either from an explicitly given transliteration or, failing that, from auto-transliteration when it is available and is able to succeed.
My preference would be to transliterate all inflections, but I can accept this compromise for the purpose of consensus. The logic here might be something like this: Cyrillic and Greek are similar enough to Latin script, and easy enough to learn, that there's a reasonable likelihood that someone interested in the inflections of a foreign word has a decent command of these scripts, whereas other scripts are generally much harder to learn and especially to master fluently to the point where a transliteration isn't helpful. This is certainly my experience: I've learned Arabic script and tried to learn Thai script and Devanagari, and my experience with all of these is that it takes a lot more work to become comfortable reading these fluently than it does with Cyrillic or Greek, both of which I learned easily. Even after a lot of work with Arabic I still sometimes stumble over the letters, and find the transliteration very helpful. An additional consideration for Arabic script is that some of the vowels are typically omitted, making transliteration essential. Even when vowels are present, they're often hard to read properly because of font considerations (the vowels are displayed above or below the letters and frequently get drawn over letter descenders or other diacritics, or sometimes a vowel below the line can be confused with a vowel above the next line below). Benwing (talk) 04:19, 9 October 2014 (UTC)[reply]
I have already expressed my opinion. Yes, splitting "easy" and "complex" scripts sounds reasonable to me. I have to ask about Korean inflected form (verbs and adjectives). @Wyang, what do you think, do we need to transliterated Korean inflected forms in the headword? Vahagn wants Armenian (and probably Georgian) to be fully transliterated. --Anatoli T. (обсудить/вклад) 04:38, 9 October 2014 (UTC)[reply]
I think the idea of compulsorily applying headwords to all languages is silly, and a lot of languages would be much better off without it, including the non-inflecting languages and some agglutinative languages. I think the headword is being overused in two aspects: 1) pronunciation; 2) inflection. For Korean, the inflection information in the headword more properly belongs in the conjugation section, and it can be moved to the top of the conjugation table as another table (identifying the key forms) alongside the stems table. The romanisation in the headword is redundant and should be removed. There is then no need for information or parameter duplication as in the cases of 십육 (rv=) and 아름답다 (irreg=y). In the division of "easy" and "complex" scripts, Korean would definitely be classified as an "easy" script, especially according to the Hangul supremacists. It's also called "morning script", as "a wise man can acquaint himself with them before the morning is over; a stupid man can learn them in the space of ten days". Wyang (talk) 22:35, 9 October 2014 (UTC)[reply]
This isn't a question of whether to have info in headwords but whether to transliterate them. I personally see Korean as a bunch of random squiggles, so for me it's not that easy. I have also heard that romanization of Korean involves various considerations beyond mere transliteration, i.e. the transcription shows various sorts of assimilations. I think one problem here is that people are thinking in terms of their own expert knowledge rather than the likely audience, which is someone who is a native English speaker and foreign language learner who may not have much experience with a foreign script. Benwing (talk) 00:19, 10 October 2014 (UTC)[reply]
I also used to look at Korean and Arabic as a bunch of squiggles, until I started learning these languages. Changes in the Korean transliteration make perfect sense when its phonology is understood. And learning a foreign script without learning a bit of a language using it doesn't make much sense. So, learning a script in a day or in a few days is applicable to people speaking that language. Arabic was somewhat easier for me (with good fonts only) and I still think Arabic script is easier and would be quite easy if vowel points were always written (I'm not suggesting it should). I think some info in the Korean headword is useful but for me the important bits are not those currently appearing there. --Anatoli T. (обсудить/вклад) 01:20, 10 October 2014 (UTC)[reply]
OK, consensus appears to be:
Sorry guys, wrench-thrower here --
What constitutes a "simple script"? Who decides what is "simple"?
Again, I must note that, as the English Wiktionary, our only safe consideration we can make when it comes to scripts is that our user base can read the Latin script. I reiterate my position that I believe we should provide romanizations for all headwords not written in the Latin script.
One argument against including romanizations for certain non-Latin scripts seems to be that the scripts are "simple". Sure, any script (or anything at all, really) can be viewed as simple, once you've already learned it. Many other scripts are also pretty straightforward, with charts providing straightforward phonetic conversions. Are we to no longer provide romanizations for Mkhedruli? Gothic? Amharic?
An undercurrent appears to be that we shouldn't include romanizations because doing so would be difficult. That said, this whole project of creating a multilingual dictionary is itself an enormous amount of work. Is such a relatively small amount of additional work really so much of a hurdle? Romanizations are a very simple way to greatly increase the usability of Wiktionary as a whole.
As with everything here, those who don't want to do the work don't have to. But as far as policy or goals are concerned, I feel very strongly that deciding to not include romanizations for non-Latin-script headwords does us, as a project, a grave disservice. ‑‑ Eiríkr Útlendi │ Tala við mig 04:55, 10 October 2014 (UTC)[reply]
@Eirikr, a few points.
  1. This issue concerns only the inflected forms in headwords. The headword itself is always transliterated, as are links.
  2. I agree with you. I would rather see transliterations (transcriptions or romanizations, more correctly) of inflections for all non-Latin scripts.
  3. I don't think it's an issue of how difficult it is but rather that some people seem to think it's "cluttering" the display.
  4. My main concern for the moment is to find some workable compromise so that CodeCat is willing to put back auto-transliteration of Arabic inflections in headwords; I'd do that myself but I don't have permission to edit Module:headword. (Can I request such permission on a page-by-page basis or do I have to become an admin?)
Here's another possible compromise:
  1. For scripts where there's no objection to transliterating inflections in headwords, we go ahead and put the transliteration there after the native-script inflected form, whether it's explicitly given or auto-transliterated. Let's say this will currently apply to all scripts except for Cyrillic and Korean, maybe Greek as well.
  2. For scripts where people think doing this will "clutter" the headword line, include the transliteration in a mouse-over -- I think this is feasible. (It could be said that we should use mouse-over for all scripts, but I'd rather have the transliteration directly visible whenever possible -- it is faster to read that way, and users might not realize that the transliteration is present on mouse-over.) Benwing (talk) 12:15, 10 October 2014 (UTC)[reply]
I've added a temporary exception to Module:headword so that Arabic inflections are always transliterated. This will hopefully alleviate your immediate concerns, but I do hope that you'll continue to participate in the wider discussion. —CodeCat 13:00, 10 October 2014 (UTC)[reply]
Thanks, and I will stay in the discussion. I wish more people would contribute; it's hard to form a consensus when only a small number of people speak up. Benwing (talk) 13:16, 10 October 2014 (UTC)[reply]
I realised I haven't stated my own opinion. I mostly follow Eirikr's reasoning, and think that transliterations should accompany all non-Latin-script terms in some form, wherever they are. Exceptions can be made in cases where terms generally appear paired with Latin-script alternatives, such as in Serbo-Croatian. —CodeCat 13:18, 10 October 2014 (UTC)[reply]
  • I support transliteration of all forms listed in the headword line in all scripts other than Latin, preferably automatically generated, even if this means certain Russian forms will appear to end in -ogo instead of -ovo. Some people might say that's easy for me to say, since the only non-Latin-script language I spend much time on is Burmese, and Burmese doesn't have inflections. Nevertheless, I think it's preferable to transliterate them all rather than to try to decide which scripts are "simple" enough that they don't need it. —Aɴɢʀ (talk) 13:53, 10 October 2014 (UTC)[reply]
    • It reminds me a bit of a debate we had some time ago, considering whether languages were "well known" and "major" enough to not be linked in translation tables and in {{etyl}}. Eventually we gave up on the debate and just made translations never link, and {{etyl}} always link. —CodeCat 14:05, 10 October 2014 (UTC)[reply]
One thing we seem to be forgetting here: why are the inflections included in the headword line in the first place? They're included for those who know the rules of the language to figure out the inflection without looking through the tables. In other words, they're a shorthand for people who mostly don't need transliterations. For someone who sees the letters as scribbles, an inflected form is most likely just decoration, anyway- whether it's transliterated or not. That means that this isn't a matter of substance (with a few exceptions such as Arabic), but of style. Chuck Entz (talk) 16:58, 10 October 2014 (UTC)[reply]
But many languages don't have tables, so we include the forms on the headword line. And even in cases where there are tables, the forms we include on the headword line are sometimes not in those tables. —CodeCat 17:00, 10 October 2014 (UTC)[reply]
Certainly for Arabic, this is exactly correct. The inflections list basic and very important things, like feminines and plurals for nouns and adjectives. For nouns and adjectives we don't currently have any inflection tables. There are other languages that are similar. I took a look at other non-Latin-script languages with inflections, and I can find only Russian and Georgian for nouns, and they also list basic things like the plural (and in the case of Russian, the genitive singular). I can easily imagine a situation where a learner has some concept of grammar -- doesn't take much to want to know how to form the plural -- but a shaky grasp on the native script. Benwing (talk) 23:29, 10 October 2014 (UTC)[reply]
For Russian, the genitive and plural forms are also in the tables. But for adjectives, there's the comparative forms, which are not in any table. For verbs, the imperfective and perfective counterparts are not in the table either. —CodeCat 23:47, 10 October 2014 (UTC)[reply]
The world of language learners is not neatly divided into those who can read the script and those who can't read the script. If push comes to shove, I can read Sanskrit in Devanagari, but I'd rather read it in transliteration because it's easier. I don't know if our Sanskrit headword lines currently include principle parts or not (our coverage of Sanskrit is not great), but if it did, I would want to have translits on each form listed. Devanagari is not just scribbles for me, but it does take me about 10 times longer to read than transliteration. —Aɴɢʀ (talk) 08:28, 11 October 2014 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────

OK, a majority seems to want to see translit of inflections in all languages. This consists of (at least) me, CodeCat, Angr, Eirikr, Vahag, perhaps also Saltmarsh. A minority seems to either want translit of inflections in only some languages, or wants fewer headword inflections in certain languages, or both. This consists of Anatoli (doesn't want translit in Russian, is OK with the rest, is OK with headword inflections in general), Wyant (doesn't want translit in Korean, wants fewer headword inflections in Korean), and Wikitiki (seems to want fewer headword inflections in general, has expressed particular opinions about Russian, might also want less transliteration although I'm less sure about that).
So, we can do two things, it seems:
  1. Take a vote.
  2. Find some compromise that will satisfy both camps. I've proposed above the idea that we can transliterate the headword inflections of most non-Latin-script languages the traditional way (in parens or something similar, after the native-script word), and for the ones where people object (Korean, Russian), transliterate using a mouse-over popup.
I'd like each person who has expressed an opinion, and any others who want, to comment indicating whether they find #2 reasonable and whether they'd accept it, and if not, do they think #1 is the way to go, and if not, what do they think is the way to go? Benwing (talk) 09:02, 12 October 2014 (UTC)[reply]
I don't feel super strongly about this, so I'm open to finding a compromise. —Aɴɢʀ (talk) 15:22, 12 October 2014 (UTC)[reply]
I really like the idea of the mouse-over popup (or tooltip) transliteration; however, MediaWiki is imposing their own "preview" popup, which does not even work properly in any useful way on Wiktionary and I really wish we could get rid of it and make room for our own popups. --WikiTiki89 14:30, 14 October 2014 (UTC)[reply]
  • What I'd like to see for transliterations is 1) the most common scheme used by default, for all languages 2) ability to switch between all of the popular transliteration systems available by clicking on a link placed near the headword, opening a popup menu with options 3) selected choice remembered when browsing other entries in the same language. 4) Ability to hide/show all transliterations for languages that use them. No "one true transliteration scheme" and no "one true transliteration display option". I believe that all of the necessary data can be generated in Lua, and selectively displayed/hidden using JavaScript. We should give users options not cripple them. --Ivan Štambuk (talk) 00:27, 15 October 2014 (UTC)[reply]

Phrasal verbs whose lemma is not the infinitive

I noticed that there are some phrasal verb entries in English that are conjugated, but the infinitive is not used as the lemma. An example I noticed just now is all hell breaks loose. This verb certainly does have an infinitive, all hell break loose. This is clear when you add auxiliary verbs: I want all hell to break loose or may all hell break loose. So I think we should move these entries to the infinitive. —CodeCat 22:24, 11 October 2014 (UTC) :We usually don't bother with inflecting phrasal verbs, as it just clutters the entry for no real gain. This kind of a case probably warrants it, however. DCDuring TALK 22:38, 11 October 2014 (UTC)[reply]

The problem is it just sounds funny when the subject of the verb is included. I know we moved there is to there be a while back, but it has the same problem: with the subject (even just a dummy subject there) present, the bare infinitive just sounds really odd. —Aɴɢʀ (talk) 22:42, 11 October 2014 (UTC)[reply]
It does, but you can't deny that the infinitive exists. So either we should make a specific rule for these cases, or we should continue to use the infinitive, right? —CodeCat 23:13, 11 October 2014 (UTC)[reply]
Among OneLook dictionaries only Cambridge Idioms actually covers this and they do it at all hell breaks loose. DCDuring TALK 23:54, 11 October 2014 (UTC)[reply]
Whichever form we make the lemma, there should be redirects from the other forms. - -sche (discuss) 03:38, 12 October 2014 (UTC)[reply]
  • This isn't what I would call a phrasal verb, nor is it so categorized. It is a full sentence. As is the case with virtually all other full English sentences (See Category:English sentences.), the verb and sometimes the noun within can be inflected. (It is trivial to show it to be a full sentence and to show it or any other sentence to occur with an infinitive.) Sentences are usually shown in their canonical form (present indicative tense). DCDuring TALK 05:49, 12 October 2014 (UTC)[reply]

use–mention distinction in reference templates

As happened seven months ago, Dan Polansky and I are currently in disagreement about reference-template formatting; this time, we disagree about whether {{R:L&S}} should enclose the cited entry title in quotation marks. I believe that such quotation marks are necessary in order to mark the use–mention distinction, and that quotation marks create a more legible presentation than italicising the entry title would. I don't know why Dan Polansky disagrees, and nor do I know why he reverted the addition of {{documentation}} to the template in the same edit. — I.S.M.E.T.A. 01:37, 13 October 2014 (UTC)[reply]

To explain, I come here in the hope that I shall find or obtain consensus to use quotation marks in {{R:L&S}}. — I.S.M.E.T.A. 01:57, 13 October 2014 (UTC)[reply]

Just ignore him. Keφr 11:14, 13 October 2014 (UTC)[reply]
@Kephir: Forgive me; does "him" refer to Dan Polansky or to me? — I.S.M.E.T.A. 17:02, 13 October 2014 (UTC)[reply]
Polansky. He is going to be obstructionist just because he can. But for the sake of having anything said on-topic, I agree with you about the quotation marks. On the other hand, some consistency in formatting mentions would be nice, which would favour italics instead. But either way, bare external link formatting seems rather unfitting to me. Keφr 15:34, 14 October 2014 (UTC)[reply]
Thanks, Keφr; I thought you meant him, but I wanted to make sure. I've made the change again; hopefully it'll stick this time round. — I.S.M.E.T.A. 18:28, 14 October 2014 (UTC)[reply]
FWIW, I agree with quotation marks, since we are referring to a piece of a larger work: "qua" (for example) is more-or-less a section title. (This is not exactly the same as the use–mention distinction. We are neither using nor mentioning the word (deprecated template usage) qua, we're just citing a source that mentions the word (deprecated template usage) qua. Perhaps a subtle distinction, but IMHO a useful one to keep in mind in cases where the reference work uses a different citation form than we do, or when it assigns a few lemmata to a single entry for whatever reason.) —RuakhTALK 04:56, 15 October 2014 (UTC)[reply]

Empowering WingerBot

I filled out a vote request to empower my new bot WingerBot, here:

Wiktionary:Votes/2014-10/Request for bot status: WingerBot

This is my first bot.

It gives a 30-day vote period, which seems excessive. For example, JackBot had a 7-day window, which seems reasonable. If that can be applied here, can someone fix up the start and end times appropriately?

Thanks. Benwing (talk) 07:20, 13 October 2014 (UTC)[reply]

Compound lists for Japanese entries (and possibly CJK in general) -- are these really needed?

With the advent of User:Haplology's various categories for Japanese entries, which compile lists of terms using each kanji (such as Category:Japanese terms spelled with 赤 read as あか, or Category:Japanese terms spelled with 幸 read as こう), it occurs to me that the potentially *huge* lists of compounds that could be compiled and included within each kanji entry are actually redundant and obsolete. Rather than laboriously compile these lists by hand, I think it makes a lot more sense to leverage the categories to do the hard work for us.

Comparing the categories and the manually created lists, the only additional information that the manual lists provide is a possible reading, and a gloss. This leads me to two things:

  • As a proposal: I posit that this information, while potentially helping to improve usability slightly, also represents a sizable negative potential for mistakes and inconsistencies. I therefore propose that we no longer include such lists in Japanese entries, referring users instead to the categories. I also submit for consideration that Chinese and Korean editors might do the same for hanzi and hanja compound lists.
  • As a request: Does anyone familiar with the inner workings of categories know if there might be some technically feasible way to get readings to display automatically in category listings? For instance, 幸運#Japanese is added to category Category:Japanese terms spelled with 幸 read as こう, with the sort argument こううん (kōun). Looking at the list on the category page, we see that 幸運#Japanese is there, but its sort argument is lost -- other than the sorting itself, the sort argument doesn't appear on the page as any kind of useful information. Is there any way of capturing sort arguments and getting them to display somehow in category lists?

I look forward to hearing what others think. ‑‑ Eiríkr Útlendi │ Tala við mig 18:19, 13 October 2014 (UTC)[reply]

I find them useful. They are not hard to create. Ideally, a bot should make those categories.--Anatoli T. (обсудить/вклад) 10:09, 14 October 2014 (UTC)[reply]
  • Sorry, which them did you mean in I find them useful? Did you mean the categories that list compounds (which are already auto-generated once the appropriate templates are added to an entry), or the in-entry lists of compounds (which so far have to be created by hand)? ‑‑ Eiríkr Útlendi │ Tala við mig 19:11, 14 October 2014 (UTC)[reply]
I find categories useful, such as Category:Japanese terms spelled with 飢 read as う. Yes, the template auto-generates cats but they have to be created manually if they are missing. --Anatoli T. (обсудить/вклад) 21:42, 14 October 2014 (UTC)[reply]

Rethinking Babel boxes

I did some minor editing at WT:Babel recently, which made me wonder whether it would make sense to rewrite {{Babel}} in Lua. My initial motivation was to integrate it with our central list of languages (maybe even into the category boilerplate system which User:CodeCat developed) and get rid of inline styles on the way. While planning this out, some other ideas emerged in my head:

  • To have the blurbs ("This user speaks Elbonian at an advanced level") in English, and English only. On one hand, this is contrary to how Babel boxes look in other Wikimedia projects. On the other, not only will it massively simplify the code, it also makes the most sense: English is the one language in which English Wiktionary's (duh) definitions, boilerplate and meta-content are written and in which discussions are (usually) conducted, and the only language which can be assumed to be understood by all users. If I am looking at a Babel box of an advanced speaker of Cantonese, I can recognise it only because I remember yue to be the code for Cantonese, and that the number 3 means advanced level. The blurb tells me nothing; I do not know nearly enough Hanzi to recognise a single character.
  • To rename the user categories. "User si-3" is rather terse and again forces me to remember language codes. "Wiktionary:Advanced speakers of Sinhalese" would be more elegant and descriptive.
  • To deprecate {{#babel:}}, as was suggested in Wiktionary:Beer parlour/2014/September#Can we disable the #babel parser function? (I see the English blurb issue was brought up there too). I think some page in the MediaWiki namespace can be edited to point users to the template instead.
  • To suggest users to add themselves to interest groups (in Module:workgroup ping/data) when they speak a certain language at a high above level.

Some considerations:

  • Integration with our central languages list would mean that, for example Template:User en-us-N would have to be folded into Template:User en (I see Template:User sr-4 already redirects to Template:User sh-4)
  • I think some users may expect Wikimedia language codes to work in our Babel boxes (they may simply copy the Babel template across projects). I think we should generally not break that expectation; however, I worry about some Wikimedia codes not mapping perfectly to local ones.

Thoughts?

Keφr 17:08, 14 October 2014 (UTC)[reply]

I support translating the Babel boxes into English. Their very purpose is defeated when they are incomprehensible. — Ungoliant (falai) 17:18, 14 October 2014 (UTC)[reply]
I agree with this too, and I definitely agree with converting to Lua to eliminate the unmaintainable mess of templates we currently have. —CodeCat 17:21, 14 October 2014 (UTC)[reply]
I support translating to English. I oppose converting to Lua because once we translate to English it will be very easy to turn it into a small maintainable template without Lua. I also oppose, as before, deprecating {{#babel:}}. --WikiTiki89 17:30, 14 October 2014 (UTC)[reply]
The Module:workgroup ping integration and (maybe) validation would be much harder to do from a bare template. And I think so would be Eirikr's suggestion to avoid nested tables (while maintaining all current functionality, at least). Keφr 08:35, 17 October 2014 (UTC)[reply]
I enjoy seeing the other languages and would be sad to see them go, but I understand and generally agree with the rationale for changing the Babel boxes to be all-English. If we're going to have them redone, my 2p request would be to not use nested tables, and to make sure that the columns actually line up properly. I'm one of those visually oriented people for whom the jagged inconsistencies of the current Babel infrastructure is so jarring, that I deconstructed the tables and rebuilt them to line up properly on my own user page. ‑‑ Eiríkr Útlendi │ Tala við mig 19:08, 14 October 2014 (UTC)[reply]
Does the text really matter, other than the English, as well as native, language name? Wouldn't luacizing the templates would mean that, as a practical matter, the text could only be in English? A new person with a new language could not be assumed capable of adding the text required in their language in a standard-conforming way, unless there were a particularly obvious way to add the text. DCDuring TALK 08:20, 15 October 2014 (UTC)[reply]
Well, maybe; translating into every language would be a bit of work (just create a huge data table… the only problem is that it would probably grow even larger than Module:languages, so we would have to split it, and it might become hard to navigate…), but could be done in principle. Though I think we could abuse the Scribunto i18n library to reuse messages provided by mw:Extension:Babel, and have every single Babel box in any language the reader desires (just add ?uselang= to the URL). Though that would put mw:Extension:Babel in a weird limbo of "deprecated but depended upon by its replacement"; and I have no idea how this interface could be exposed. Or we could just use that facility to maintain the status quo (pardon the Polanskyism) of having them in the target language. Keφr 08:35, 17 October 2014 (UTC)[reply]
I always thought that the purpose of having the blurbs in the target language was to help non-English speakers or English language learners to find users with whom they might be able to communicate if they needed help. I think it is beneficial to see the name of the language in English so that English speakers can easily recognize which language the box indicates. - [The]DaveRoss 20:35, 16 October 2014 (UTC)[reply]
I did not consider this. This is a good argument. Keφr 08:35, 17 October 2014 (UTC)[reply]

Spaces in alphabetization of language names

How do we treat spaces when we alphabetize language names? Specifically, does "Lower Sorbian" precede or follow "Low German"? If we ignore spaces, then "LowerSorbian" precedes "LowGerman", but if we treat spaces as preceding A in alphabetical order, then "Low_German" precedes "Lower_Sorbian". —Aɴɢʀ (talk) 19:01, 14 October 2014 (UTC)[reply]

There are pros and cons to both options. What do Dictionaries that list multi-word phrases as separate entries do? --WikiTiki89 01:35, 15 October 2014 (UTC)[reply]
I just checked six print dictionaries (two British, four American) and they all ignore spaces (hotchpot before hot dog before hotel). —Aɴɢʀ (talk) 06:12, 15 October 2014 (UTC)[reply]
w:Alphabetical order#Treatment of multiword strings is relevant.​—msh210 (talk) 12:30, 15 October 2014 (UTC)[reply]
That page basically outlines the question, but does not provide an answer. --WikiTiki89 12:39, 15 October 2014 (UTC)[reply]
Both treatments are valid; the question is, which do we want to use? Dictionary headwords apparently usually follow the "ignore the space" rule, but other lists may follow the "treat the words separately" rule. —Aɴɢʀ (talk) 13:53, 15 October 2014 (UTC)[reply]
Internet-based sorting, including our own categories, generally treats a space as being ordered before any other character. So that would place Low German before Lower Sorbian. —CodeCat 18:27, 17 October 2014 (UTC)[reply]
Some paper dictionaries, too, use this ordering, e.g. the Routledge dictionary of historical slang: have a look at http://books.google.fr/books?id=JRuNMHNcu5cC&pg=PP12&lpg=PP12&dq=%22something+before+nothing%22+dictionaries&source=bl&ots=6iDNPNRHjr&sig=S8mC2Wqar5xb4FCC2zWaw4itGG8&hl=fr&sa=X&ei=yXJBVNebNMnDPPWkgIgG&ved=0CCMQ6AEwAA#v=onepage&q=%22something%20before%20nothing%22%20dictionaries&f=false This is the better ordering for our kind of dictionary. Lmaltier (talk) 19:52, 17 October 2014 (UTC)[reply]
This dictionary calls it something before nothing. Do you understand why? Lmaltier (talk) 20:12, 17 October 2014 (UTC)[reply]
On what basis do you say "This is the better ordering for our kind of dictionary."? I happen to be leaning the other way. --WikiTiki89 21:30, 17 October 2014 (UTC)[reply]

Extended etymologies

I came up on this website illustrating an idea that I had in mind for a while (click on the blue links in the leftmost column). We could extend the < "derives from" operator used in etymologies to generate a drop-down table illustrating intermediate steps between pairs in the derivational chain, i.e. all of the sound changes involved. Short descriptions could link to appendices where more details are available. This would be applicable to both reconstructions and attested etymons, including borrowings (which often undergo some special rules can nevertheless be described and cataloged). Chronologically inverted list would be used in the descendants sections of the corresponding source word/reconstruction. Support could be added for multiple sequences of derivation, and even multiple sources or different reconstructions reflecting different protolanguages. It would however require some non-trivial investment in the groundwork to make it work, so it should best be approved (or better: not disapproved) first before people waste time. I've seen some recent works that use this method but they use numbers instead of descriptions to explain what's going on, so one has to manually look up what each of the numbers used means, and the layout is horizontal not vertical. --Ivan Štambuk (talk) 00:13, 15 October 2014 (UTC)[reply]

Support, although I recognize that there should be a lot of discussion about the specifics of the layout. --WikiTiki89 01:36, 15 October 2014 (UTC)[reply]
How would it work, on a technical level? How would you share data between entries? DTLHS (talk) 04:46, 15 October 2014 (UTC)[reply]
Support. I had a vague idea about having such lists in appendices somewhere, but never developed it. Filling out the details would seem to go beyond the limits of published sources without resorting to the kind of extrapolation that you've been berating CodeCat for- are you ok with that? Chuck Entz (talk) 13:30, 15 October 2014 (UTC)[reply]
Support. Categorization based on sound change could also be added, such as Category:Old Armenian terms derived by Meillet's law. Or such terms could appear on the appendix dedicated to Meillet's law. --Vahag (talk) 10:29, 16 October 2014 (UTC)[reply]
  • I think this might overwhelm normal entries, especially if people do it for every morpheme in a polymorphemic word, but it would be nice to do this somehow on reconstructed-form appendix pages. —Aɴɢʀ (talk) 13:49, 15 October 2014 (UTC)[reply]
    • It wouldn't be too bad if we restricted it to the rules between a term and its nearest parent (i.e., an English etymology would only have the steps between it and Middle English or maybe Old English), and hid the list so that only those who choose to look at it would see it. Chuck Entz (talk) 13:57, 15 October 2014 (UTC)[reply]

Categories for words that have pronunciations marked in the form of IPA

Should we create such categories? I believe that it is convenient to go to Special:WhatLinksHere/Appendix:Italian pronunciation for the above information. --kc_kennylau (talk) 09:53, 15 October 2014 (UTC)[reply]

What's the general consensus view on handling abusive editors?

I stumbled across the activities of a new editor and have been quite impressed at how abusive they can be -- foul language, name-calling, lawyering, basically the kind of trollish behavior that drove me from Wikipedia years ago. I analyzed their total contributions, only a short list so far, and found that more than a quarter have been on talk pages, where this editor has mostly argued about editing decisions, illustrated their profound ignorance of the consensus here, and berated other users. Another more-than-quarter has been in this user's own userspace. 40% has been actual constructive mainspace edits, mostly in January-March this year. Out of the total, more than a quarter has been confrontational and even outright abusive.

For what it's worth, this editor has not yet had any direct dealings with me.

How would other admins approach this? ‑‑ Eiríkr Útlendi │ Tala við mig 18:05, 17 October 2014 (UTC)[reply]

I would post a warning on his/her user page along the lines of "Start being nice to people, or I will block you." (but in a more polite way). --WikiTiki89 18:10, 17 October 2014 (UTC)[reply]

Proposal: use quotation marks to mark headwords cited in reference templates for Latin-script languages

Further to §: use–mention distinction in reference templates above, may I suggest that we use quotation marks in our R:-prefixed reference templates to mark the headwords cited by those templates? So, for example, the standard format (at least where the headword is concerned) would be:

  • “foo, n.” in Some Big Dictionary

(Because of potential problems with using quotation marks with other scripts, I make this proposal for Latin-script languages only.) Does that seem sensible to everyone? Is there consensus? Shall I prepare a vote? — I.S.M.E.T.A. 18:35, 17 October 2014 (UTC)[reply]

  • It's also worth noting that all three of these changes to remove the quotes were in 2009, now half a decade ago. Attitudes and ideas change over time. I suggest we check the opinions of the relevant people here. That said, Ullman is no longer with us, and Spangineer's last edit was in 2010. @DCDuring do you have any input on this quote issue? ‑‑ Eiríkr Útlendi │ Tala við mig 19:32, 17 October 2014 (UTC)[reply]
We use quotes for glosses, so any need for glosses in such templates — quite possible IMO — would require multiple quotes.
If we resort to further distinction, I would strongly oppose ever using italics as it makes it impossible to maintain the appropriate typographic contrast for the taxonomic names that are supposed to have it. DCDuring TALK 19:51, 17 October 2014 (UTC)[reply]
  • Re: links, are there any cases where a term might not be linked in such a template call? ‑‑ Eiríkr Útlendi │ Tala við mig 20:27, 17 October 2014 (UTC)[reply]
    It certainly might not always be the pagename. In some cases having a named link might be misleading, as it implies that it is possible to go to a page that is directly related to the term, rather than, say, a general search-form page. The more I deal with these, the more I appreciate such refinements. Also: optional italics for the taxonomic names that need them ("i=1") and a optional gloss ("gloss="). Not every template needs such options, but they are handy. DCDuring TALK 22:08, 17 October 2014 (UTC)[reply]

Redesign-Redefine of Russian Entries

I'm going towards a large redo of many of the Russian pages by translating swathes of info from Russian Wiktionary with a focus to layout consistency, definition accuracy/representation, and perhaps design/coding.

The information on en-Wiktionary is generally inadequate for translating literature and often confusing for some basic words. The information is all present here on Wiktionary but only on Wiktionary-ru, hence unavailable to those non-fluent speakers as many examples for definitions cited there derive from literature.

I started translating Dostoevsky, ( https://github.com/icarot/bk ) which was when inadequacies became more obvious.

Roadmap:

1) Work with the Grease Pit to try to normalize the layout of data as consistently as possible, for parsing by robots. A parser/morphological analyzer needs quality, open data. The more uniform the format, the easier the parse.

2) Improve word-count and definition count immensely. Probably on the order of at least several thousand for one of them.

3) Cleanup messy pages, i.e. for 'весь', (which not only confound the novice with the unintuitive concept that Russian uses declensions to represent irregular meaning on an unusually multi-purpose [pronoun-adjective] word), but do not represent even all of the critical meanings. And more, etc.

What are some desired improvements I've missed for Russian translations which can be directly bettered from using the conventions and wide-scope of information on Russian Wiktionary?


I want to get criticisms, guidance, etc. etc. before undertaking this all. I wouldn't just run rampant through all of the Russian articles without letting the community know what was going on, or asking for help.

Icarot (talk) 00:18, 18 October 2014 (UTC)[reply]

Hi.
We have seen you talking but we haven't seen you working :). You're welcome to demonstrate your ideas. Yes, we need more Russian entries and some entries may need fixes or improvements but you can't make major changes without a prior agreement. --Anatoli T. (обсудить/вклад) 05:20, 18 October 2014 (UTC)[reply]
  • Just a heads-up: Any automatic transmission of data from Russian Wiktionary into English Wiktionary has to clearly indicate the source of the data in the edit summary to prevent copyright violation. --Dan Polansky (talk) 05:34, 18 October 2014 (UTC)[reply]
  • @Icarot: I can help you generate stubs for Russian nouns, adjectives, verbs and adverbs (the rest are a closed category and mostly covered). Stubs would be entries like in this category - the only thing they are missing are definitions. I could help extract a list of missing lemmas from a particular work. We could also pregenerate a list of examples for every entry and format them using the {{usex}} template, by taking them from ru Wiktionary, glosbe, parallel corpora databases, subtitles, google translate and so on, that editors could easily copy/paste into entries that are missing them. Don't worry about associations (derived terms, *nyms, morphological etymologies etc.) - those can be largely automated once entries with definitions are created. The primary focus should be on coverage. --Ivan Štambuk (talk) 07:32, 19 October 2014 (UTC)[reply]
    • Not sure why you are not continuing with this crap in the Serbo-Croatian Wiktionary. It already has more than 100 000 Serbo-Croatian definitionless entries. If Wiktionary users are so hungry after such content as you posit, Serbo-Croatian Wiktionary could become one of the most visited Wiktionaries soon. Unless it gets shut down due to copyright violation, that is, such as because of automated lifting of data from Google translate as you seem to suggest above. --Dan Polansky (talk) 07:58, 19 October 2014 (UTC)[reply]
      Inflections cannot be copyrighted, the databanks such as HJP are completely free. Besides, I fixed many errors in them, and used two others as well. Definitions on the other hand can be copyrighted, and are nevertheless abundantly stolen by many FL Wiktionaries without anyone so much raising an eyebrow. Don't worry Polansky, soon I'll add many such stubs for Czech as well. --Ivan Štambuk (talk) 08:07, 19 October 2014 (UTC)[reply]

IPA, language code and error message

Whatever changes were made to IPA modules to make older pages (2013) have conspicuous red error message in the IPA section should be undone. Example: this revision. Old revisions should look as legible and sane as possible; this is not. In general, IPA templates should not require the language parameter; filling-all-the-fields concerns should be delegated to editors with a shovel who have no real interest in building the dictionary. --Dan Polansky (talk) 05:31, 18 October 2014 (UTC)[reply]

I agree that the lack of a lang parameter shouldn't result in an error message, but we don't have any editors who have no real interest in building the dictionary. People with no interest in building the dictionary don't become editors. —Aɴɢʀ (talk) 07:00, 18 October 2014 (UTC)[reply]
I completely agree that there shouldn't be an error message. A cleanup category would be sufficient. --WikiTiki89 14:27, 18 October 2014 (UTC)[reply]

There's a lot about this entry that makes me nervous: the word was apparently coined in a journal article published in mid August, with some or all of the authors working at Alabama State University in Montgomery, Alabama. The Wiktionary article was created at the beginning of September by an anonymous contributor whose IP is assigned to ASU. A variety of IPs from the same southern Alabama/northern Florida area as ASU, as well as an account that seems to bear the name of one of the authors, have been adding references, which are all articles/blurbs about either the research program at ASU or about the original article itself. It's tagged as a hot word, but it looks to me to be lukewarm at best: a Google search does show the word in a blog or news article here or there, but this isn't the kind of strong, widespread adoption we saw with olinguito.

I can't escape the impression that we're being used for promotional purposes, and I feel we need to do something- but I'm not sure whether to tag this for cleanup to prune out all the PR from the references, or to rfv it, or something else. It certainly doesn't meet the letter of the CFI, since it's only 2 months old, but how do we decide whether this is "hot" enough to keep it provisionally as a hot word? Chuck Entz (talk) 05:04, 19 October 2014 (UTC)[reply]