Wiktionary:Beer parlour/2016/July
Renaming all requests categories
[edit]There, I suggested renaming all kinds of requests categories to a consistent format.
I am mentioning this here because it's a large proposed change. Thanks. --Daniel Carrero (talk) 20:00, 1 July 2016 (UTC)
Closing OrphicBot vote
[edit]Concerning this vote:
The vote was scheduled to end on June 30, today is July 3. Current results: 4-1-1. (the abstention is mine)
I am hesitant to close the vote as "Passed". It seems some issues about Ancient Greek are under discussion in the "Oppose" section.
So I decided to extend the vote +7 days. New end date: July 10. Please check if all is OK with the bot proposal, before we can close the vote. Thanks.
--Daniel Carrero (talk) 11:03, 3 July 2016 (UTC)
- @Daniel Carrero: Thank you for extending the vote; I was unsure what the process was at this point. I believe the specific concerns of the user who has not responded in twelve days to have been addressed: 1) The robot is not intended to remove bad R:DGE links, but I have demonstrated it does not add them. There are only two which need to be removed by hand. 2) Edit summaries are now generated automatically based on actual changes. During testing, I wrote them by hand. The seven samples tested two functions, and my edit summary reflected the more experimental one. Further discussion of the robot has since taken place here: User_talk:I'm_so_meta_even_this_acronym#R:LSJ_and_the_Perseus_Resolve_Form. I am a little bit concerned the dissenting voter is requesting maximalist features (sorting the References section according to Classical-oriented preferences) without consulting other users. I like his requested features, and I am very pleased he asked for them, but I would not have foist them on everyone else myself, given LSJ is not relevant to Byzantine studies, is not a first choice for readers of Homer, and is not as suited to the needs of language learners as Middle Liddell. I had assumed this proposal was so innocuous that for this reason it had likely not generated any attention in almost two weeks. Given the References modules are somewhat more complicated than a programme for appending strings to a list according to some heuristics and sorting the list, I am puzzled anyone would think this task would be significantly safer left to another user. However, should anyone else want to do it, my existing code is available at User:OrphicBot and I would not mind withdrawing this proposal. Isomorphyc (talk) 13:20, 3 July 2016 (UTC)
- I don't think you should withdraw the proposal. —JohnC5 14:59, 3 July 2016 (UTC)
- It is not aboute LSJ vs. Middle Liddell; it is about LSJ vs. a Spanish dictionary, as detailed in the vote in the discussion below my vote. I ultimately returned to opposition not because of template order but because the interaction, and changes in mainspace and their edit summaries did not give me enough confidence. For that I am really sorry since this is a great initiative. --Dan Polansky (talk) 17:48, 6 July 2016 (UTC)
- Hi @Dan Polansky:; I regret this did not work out. If you do have feature requests or other issues regarding the classical languages references after the vote, however, please do let me know, as this is an ongoing process. The issue with your sorting preferences was my concern-- I agree with your preferences, but I was concerned you and I were the only participants in that conversation. I know more users now and am more comfortable with this sort order. You may have been less concerned about the possibility of dead links had you participated in the earlier conversations. My initial edits to R:LSJ (a module we have both worked on) removed more than a thousand dead links it produced, and all of my successive modules followed the same procedure of indexing the target dictionary. (I did not work on R:DGE, and only linked it at another user's request, as you can see in the discussions). I agree with you most about the edit summaries, but as you can see in the code, the production version writes these automatically, but I did not have this feature during testing, and it was necessary to test multiple features in the same run. Assuming the robot is approved, I hope the changes work out to your satisfaction, and I trust we may find something about which to agree at a later time. Isomorphyc (talk) 18:26, 9 July 2016 (UTC)
Shortening some 'exceptional' language codes
[edit]Currently, exceptional codes for non-proto-languages are created by adding three letters approximating the name of the language onto the end of, as WT:LANG puts it, "a relevant family code". Because WT:LANG goes on to specify that "this system is used even if the relevant family code is itself an exceptional code rather than an ISO-derived code", many exceptional language codes have three parts: ira-azr-klt, ira-azr-kls, nai-yuc-tip, nai-yuc-yav, qfa-ctc-cat (qfa-ctc should actually be sai-ctc for consistency with other family codes, but that's a separate matter), qfa-ctc-col, qfa-len-slv (qfa-len should be nai-len, but that is again a separate matter). However, others have only two parts: Kitanemuk is azc-ktn rather than azc-tak-ktn, Phuthi is bnt-phu rather than bnt-ngo-phu.
At RFM, Μετάknowledge and I were discussing whether or not to always only use the nearest ISO family code (except where there is none), to obtain shorter codes, like ira-klt instead of ira-azr-klt. Other benefits are that when the precise (sub)family membership is uncertain, using only the ISO's high-level family codes is often "safer", and it allows editors to add codes for subfamilies (such as "Upper Amazon Arawakan") without worrying about needing to recode any languages (such as Amarizana and Anauyá and Maypure) which had the newly-added subfamily as their most immediate family.
If you think switching to only two-part codes is a good idea, the following codes will be affected: ira-azr-klt (which would become →ira-klt), ira-azr-kls (→ira-kls), nai-yuc-tip (→nai-tip), nai-yuc-yav (→nai-yav), qfa-ctc-cat (→sai-cat), qfa-ctc-col (→sai-col), qfa-len-slv (→nai-sln, renaming the second element to incorporate 'Lenca' now that the family portion of the code no longer does). Alternatively, if you think we should stick to using the nearest family code, then languages like azc-ktn will need to be renamed (to azc-tak-ktn).
Proto-languages are not part of this proposal, and 'exceptional' languages which belong to families that do not have ISO codes would continue to be named using the existing system. (For example, if we wanted to add an exceptional code for the divergent Korean dialect Foobarese, it would be qfa-kor-foo because the Koreanic language family has no ISO code.) - -sche (discuss) 20:47, 4 July 2016 (UTC)
- I support shortening these codes. They're already long and confusing enough for editors to remember and get used to; the least we could do is make them shorter and more stable, which this proposal would accomplish. —Μετάknowledgediscuss/deeds 21:02, 4 July 2016 (UTC)
Done. - -sche (discuss) 07:21, 7 July 2016 (UTC)
Open call for Project Grants
[edit]Greetings! The Project Grants program is accepting proposals from July 1st to August 2nd to fund new tools, research, offline outreach (including editathon series, workshops, etc), online organizing (including contests), and other experiments that enhance the work of Wikimedia volunteers. Whether you need a small or large amount of funds, Project Grants can support you and your team’s project development time in addition to project expenses such as materials, travel, and rental space.
- Submit a grant request or draft your proposal in IdeaLab
- Get help with your proposal in an upcoming Hangout session
- Learn from examples of completed Individual Engagement Grants or Project and Event Grants
Also accepting candidates to join the Project Grants Committee through July 15.
With thanks, I JethroBT (WMF) 15:21, 5 July 2016 (UTC)
This is a new module I just created, announced at WT:NFE. I hope it's useful! If anyone needs additional encodings, let me know. It only supports ISO 8859-1 currently. —CodeCat 22:31, 5 July 2016 (UTC)
Small, ugly modification to improve Ancient Greek ASCII searchability -- justified?
[edit]Hello Ancient Greek editors with whom I have corresponded a few times (namely: @CodeCat, @JohnC5, @Metaknowledge, @I'm so meta even this acronym ): sorry to bother all of you, but I've made a trial change to the Template:grc-noun using a new module Module:grc-ascii-searchability which solves some searchability problems I have had in Greek. However, the change is very unaesthetic. I was going to revert it if it seems particularly disliked, or if there's a better solution available, but I was thinking of extending it to the other twelve or so templates if it seems to solve a problem others have. My problem is that I have trouble searching for Greek words on Wiktionary because I tend to type in the Roman alphabet even when I shouldn't. As a result, to find Greek words I usually look for cognates whose etymologies likely link, or else I find the word on Perseus with betacodes and then paste the link into Wiktionary. This process normally takes a few minutes. I thought there had to be a better way-- but possibly I found a worse way. I dangled some invisible text from the Greek headword template with unique variants from up to nine different Romanisation schemes (asciisation schemes, really), so that now a search for something like "qea greek" or "chalix greek" or "elios greek" or "hlios greek" in the search bar turns up the relevant Greek noun usually as the first or second result. (qea, as you know, is unaccented beta code for θεᾱ́; the rest are self-explanatory). In general, anything in this list should be searchable using most reasonable asciisation schemes I could think of: https://en.wiktionary.org/w/index.php?title=Special:WhatLinksHere/Module:grc-ascii-searchability&hideredirs=1&hidelinks=1. My hope is that these Romanisations will percolate into external search engines as well so that our Greek entries will be easily searchable that way. Does this seem like a change worth extending? If it is not widely liked I will revert it, but if it is, I will either add it to all of the Greek headword templates, or else (more cleanly, if this method is preferred), look in to finding the appropriate place for it in Module:headword. I had wanted to get the Greek words into the search bar's autocomplete feature through typing the Romanisations, but I think that is completely impossible for me. It is worth pointing out that the macron-accented Romanisation already offered by the headword module is also searchable in plain ASCII; it is just a little bit more detail-oriented to find, and speaking personally, it is not something that was ever part of my process to look for, or indeed, not something I knew about till I saw it in my own search results a few minutes ago. Thanks for your time. Isomorphyc (talk) 04:12, 6 July 2016 (UTC)
- Not sure I see the point. Searching "thea greek" already works (I just tried it), so there's no reason we need to have "qea greek" work as well. Beta code is far more nonobvious than our romanisation, IMO. —Μετάknowledgediscuss/deeds 04:59, 6 July 2016 (UTC)
- I think you are quite right. I will delete this if no-one else replies. Thank you! When I started I incorrectly assumed our primary Romanisation was not searchable through the macrons. Isomorphyc (talk) 09:37, 6 July 2016 (UTC)
- Edited: @Metaknowledge: I originally did not realise this, but the reason you can search "thea greek" I believe is because of the template. The reason might be that since the Romanisation appears twice in the page, it outranks other pages which cite it. Try something other than a noun, for example, αἰσθάνομαι via "aisthanomai greek." One has the problem that a great many etymologies in derived languages appear before the main entry. The reason I included beta code and other variants was not so much because I like any of them, but because I wanted to include every reasonable asciisation scheme to avoid controversy about which one to use. At the same time, this is a lot of ad-hoc mess for a little benefit, and this is a parochial solution to a general problem, in which popular pages with outgoing link Romanisations outrank their link targets in searches. Isomorphyc (talk) 10:50, 6 July 2016 (UTC)
- Further edit: The most relevant comparison is with verbs, which are remain in their original state: https://en.wiktionary.org/w/index.php?title=Special%3AWhatLinksHere%2FTemplate%3Agrc-verb&hidelinks=1&hideredirs=1 Isomorphyc (talk) 12:55, 6 July 2016 (UTC)
- Oh, I see. Well, it could be useful. At the same time, I'm disinclined to do things that are ugly unless they will get a lot of use, which I doubt this will. I suppose you'd best wait for another classically inclined editor's opinion. —Μετάknowledgediscuss/deeds 18:15, 6 July 2016 (UTC)
- What would you or others think of repeating ascii versions of all headword Romanisations within non-visible html to make them outrank pages which link to them for ascii-Romanised searches? Clearly, this method works, and what is ugly (and unfair to Modern Greek) is that I am doing it for just one language. The real problem is that we can't (I think) hint to the search function any other way that a Romanisation has almost the force of a headword. The really ugly thing which would solve the problem, but which I would never advocate, is to give the Romanisations their own headwords, as is done in Chinese, which is pretty convenient, but still less so than Greek because it is harder to type in Greek. I know I am suggesting a significant change; I wouldn't want to go forward with any of these options unless at least a few people are ecstatic, but this is not what I am seeing here. Isomorphyc (talk) 19:42, 6 July 2016 (UTC)
- I think we should have entries for romanizations of all languages. We seem to go by the rule that lesser-known scripts get their own romanization entry while better-known and more widely used scripts don't. But this argumentation is completely pointless in the face of users who don't understand the script regardless. It doesn't help them in the slightest that the script is well known if they don't know it. —CodeCat 19:46, 6 July 2016 (UTC)
- Ultimately, this is probably what we should do. I've categorised that idea under things to deal with in the far future, but it could actually be a great way to increase readership in the short term. It would require a very active romanisation bot, of course, and we'd presumably have to phase in languages one by one (make sure that all the romanisations are good, then create the soft redirects). —Μετάknowledgediscuss/deeds 19:51, 6 July 2016 (UTC)
- I took a quick look at a random pageview statistics file amongst the five or ten terabytes available here: [1]. Of the top ten Chinese words visited in that particular hour, all ten were hanzi, not pinyin or another Romanisation. Obviously I could make a more systematic study of this, but the most cursory evidence suggests adding Romanisations will not increase readership much. (Incidentally, Latin words are one of the biggest attractions here). I think this question is not really about readership in general, but about how specific types of users use specific languages of largely academic interest. Isomorphyc (talk) 20:45, 6 July 2016 (UTC)
- Of course, the romanizations may not be used nearly as much. But can we get statistics of the romanizations alone? How much are they used? —CodeCat 20:51, 6 July 2016 (UTC)
- This will have sampling size issues, time of day issues, language choice issues, etc. but for the file I am looking at, about 12% of Mandarin visits are pinyin and 88% are hanzi. I have most of the dataset locally, so this could be done more systematically or for other languages, of course, with more planning. The numbers are 1492 hanzi entries visited 1922 times, compared to 174 entries visited 261 times in that hour interval for pinyin. Isomorphyc (talk) 21:18, 6 July 2016 (UTC)
- In general, Romanisations are 1.2% of pages viewed in this period: 1415 out of 114222. So Mandarin has about a 10x statistical lift for users preferring Romanisations compared to the average language. Isomorphyc (talk) 21:22, 6 July 2016 (UTC)
- Of course, the romanizations may not be used nearly as much. But can we get statistics of the romanizations alone? How much are they used? —CodeCat 20:51, 6 July 2016 (UTC)
- I took a quick look at a random pageview statistics file amongst the five or ten terabytes available here: [1]. Of the top ten Chinese words visited in that particular hour, all ten were hanzi, not pinyin or another Romanisation. Obviously I could make a more systematic study of this, but the most cursory evidence suggests adding Romanisations will not increase readership much. (Incidentally, Latin words are one of the biggest attractions here). I think this question is not really about readership in general, but about how specific types of users use specific languages of largely academic interest. Isomorphyc (talk) 20:45, 6 July 2016 (UTC)
- Ultimately, this is probably what we should do. I've categorised that idea under things to deal with in the far future, but it could actually be a great way to increase readership in the short term. It would require a very active romanisation bot, of course, and we'd presumably have to phase in languages one by one (make sure that all the romanisations are good, then create the soft redirects). —Μετάknowledgediscuss/deeds 19:51, 6 July 2016 (UTC)
- I think we should have entries for romanizations of all languages. We seem to go by the rule that lesser-known scripts get their own romanization entry while better-known and more widely used scripts don't. But this argumentation is completely pointless in the face of users who don't understand the script regardless. It doesn't help them in the slightest that the script is well known if they don't know it. —CodeCat 19:46, 6 July 2016 (UTC)
- What would you or others think of repeating ascii versions of all headword Romanisations within non-visible html to make them outrank pages which link to them for ascii-Romanised searches? Clearly, this method works, and what is ugly (and unfair to Modern Greek) is that I am doing it for just one language. The real problem is that we can't (I think) hint to the search function any other way that a Romanisation has almost the force of a headword. The really ugly thing which would solve the problem, but which I would never advocate, is to give the Romanisations their own headwords, as is done in Chinese, which is pretty convenient, but still less so than Greek because it is harder to type in Greek. I know I am suggesting a significant change; I wouldn't want to go forward with any of these options unless at least a few people are ecstatic, but this is not what I am seeing here. Isomorphyc (talk) 19:42, 6 July 2016 (UTC)
- Oh, I see. Well, it could be useful. At the same time, I'm disinclined to do things that are ugly unless they will get a lot of use, which I doubt this will. I suppose you'd best wait for another classically inclined editor's opinion. —Μετάknowledgediscuss/deeds 18:15, 6 July 2016 (UTC)
@Isomorphyc: I am more in favour of including the invisible HTML Romanisations than I am in favour of creating entries for Greek Romanisations. Neither idea fills me with joy, however. For you personally, I recommend learning to type polytonic Greek with your keyboard. — I.S.M.E.T.A. 13:26, 7 July 2016 (UTC)
- Hi @I'm so meta even this acronym:: I use this method for composition, and it is not too bad, though I find it is not so good for dictionary type of tasks. Pinging also @CodeCat, @Metaknowledge, @JohnC5: following the earlier discussion, I have some statistics for Romanisation. They are quite interesting. Mandarin is the only language which is fully Romanised, and my old numbers were wildly incorrect (because, as I had warned, of time-of-day bias. It was daytime in China, and pinyin was unpopular). Pinyin has received 2m of Mandarin's 4.2m year-to-date page views, mostly during night-time in China. Please see the following chart. The data are summed over all 1H 2016 page views. I have put the table in a user page appendix: User:Isomorphyc/Romanisation_Page_View_Statistics
- Several observations: Adding Japanese Romanisations would perhaps yield a 50% increase in Japanese page views. The current convention in Cantonese is to offer Jyutpin entries only for characters. Switching to a convention which Romanises all words (as in Mandarin) could nearly perhaps double Cantonese page views. I am not advocating either of these (I don't know much Cantonese or any Japanese), but only mention this; wide readership should be perhaps even only a secondary objective here, in some ways. Archaic-script Romanisations are very popular, but a caveat in Gothic is that a great many entries were created in 2016. The script and the Romanisations are surprisingly popular-- presumably the script users are clicking on intra-Wiktionary etymologies while the non-script users are searching non-script outside references. There is no data on the popularity of Romanisations for languages with widely-used scripts (Hebrew, Devanagari, Greek, Cyrillic, etc.) The only user-base I can imagine are people (such as myself) who are too lazy to change their keyboard configuration. I believe learners of modern languages only ever use Romanisations for character-based, never script-based, languages. We already gloss most widely gloss Romanisations in mentions and lists with templates, so I don't see a case that users are confused. If someone really wants to do this, my suggestion would be to pick one popular-script language, Romanise all of it, and see if the usage numbers justify keeping it and proceeding after a reasonable amount of time.
- More observations (not in the table): Wiktionary has about a billion page views a year, or thirty per second. Fully 8.9% of this traffic is people looking up non-lemma forms in Latin. (A further 4.5% is lemma forms). Latin comprises, I think as is widely known, 13.7% of entries and 13.8% of page views. Evidently nearly 10% of Wiktionary by weight and popuarlity is essentially an easy to use Latin stemmer. I suspect the best thing one could do for Greek does not relate to Romanisations, but instead is extensively to create non-lemma entries while ensuring a core vocabulary of the top 5000 Attic and Koine are available. I wouldn't mind working on this in the medium term, but for now, to me, the Romanisation searchability seems like very low-hanging fruit.
- While I am here, this table may also be of mild interest. User:Isomorphyc/Page_Views_and_Entry_Counts_by_Language_1H_2016
- Apologies for the bad formatting, long note and long delay. The raw 2016 data are 3 TB uncompressed, but I can offer a ~200 MB file if anyone would like annualised 2016 data. Thanks for reading. Isomorphyc (talk) 00:07, 12 July 2016 (UTC)
- @Isomorphyc Is there any referrer information (what sites people came here from)? Or differentiation between internal and external refers. DTLHS (talk) 02:21, 12 July 2016 (UTC)
- @DTLHS: Unfortunately, no. The data are per-page hourly view counts for all Wikimedia projects. I believe this is for privacy concerns. I had wished for the same thing. Isomorphyc (talk) 02:25, 12 July 2016 (UTC)
- @Isomorphyc: This information is startling interesting and represents the strongest argument I've ever seen for large-scale Romanizations. I've always been a proponent of using native scripts (as evinced by my painstaking construction of AP:Old Italic script), but if extra entries make that much difference, I'd certainly support their broader use. —JohnC5 04:53, 12 July 2016 (UTC)
- @JohnC5: For all practical purposes, we do not have data on Romanisation of scripts, only of character systems. I believe the benefit is likely far less, because character systems usually have official Romanisations which are often taught to tourists, beginning students, and casual learners. Russian, Arabic, and Modern Greek are the three most developed non-Roman script languages on Wiktionary. I would see very large issues in creating Romanised headwords for any of them, but it may still be worth a trial. Academic languages (Greek, Sanskrit) I think are a separate category. Isomorphyc (talk) 11:56, 12 July 2016 (UTC)
- Modern languages such as Mandarin and Japanese have official romanization standards, so creating romanization entries for the best-known ones isn't that hard. The same is true for languages such as Gothic, with relatively uniform treatment in the literature. Ancient Greek, on the other hand, has a plethora of ad hoc systems that vary in subtle ways: is it chi, khi, or xi? How about xi vs ksi? Or diphthongs: ou or u, ei or i? How is length handled? I would think that, as the predictability of what people would search for diminishes, the benefits from romanization entries would, as well. Even with Mandarin and Japanese, we have Hanyu pinyin and romaji, but not Wade-Giles or Hepburn. Then there are things like beta code: even if the system would let us have entries like fqoggh/ or w)=|, I think they would cause more confusion than they're worth (having entries searchable by such things is another matter). Chuck Entz (talk) 13:31, 12 July 2016 (UTC)
- Another idea to throw in the ring: is there any way to create an input method or keyboard of some sort, so people could type beta code into the search box and get Greek to show up? And beta code has standards for other scripts, as well, so it would be useful for many other languages. Chuck Entz (talk) 13:45, 12 July 2016 (UTC)
- @Chuck Entz: That idea sounds wonderful! I'm not sure how to implement it, but I'd love to help. —JohnC5 14:21, 12 July 2016 (UTC)
- @Chuck Entz: I do not know to interface with the search bar, or much about JavaScript; but I would be glad to learn if there is a way for me to help. I like this idea very much. Isomorphyc (talk) 01:20, 13 July 2016 (UTC)
- Another idea to throw in the ring: is there any way to create an input method or keyboard of some sort, so people could type beta code into the search box and get Greek to show up? And beta code has standards for other scripts, as well, so it would be useful for many other languages. Chuck Entz (talk) 13:45, 12 July 2016 (UTC)
- Modern languages such as Mandarin and Japanese have official romanization standards, so creating romanization entries for the best-known ones isn't that hard. The same is true for languages such as Gothic, with relatively uniform treatment in the literature. Ancient Greek, on the other hand, has a plethora of ad hoc systems that vary in subtle ways: is it chi, khi, or xi? How about xi vs ksi? Or diphthongs: ou or u, ei or i? How is length handled? I would think that, as the predictability of what people would search for diminishes, the benefits from romanization entries would, as well. Even with Mandarin and Japanese, we have Hanyu pinyin and romaji, but not Wade-Giles or Hepburn. Then there are things like beta code: even if the system would let us have entries like fqoggh/ or w)=|, I think they would cause more confusion than they're worth (having entries searchable by such things is another matter). Chuck Entz (talk) 13:31, 12 July 2016 (UTC)
- @Isomorphyc: Where do you find these page view statistics? --WikiTiki89 13:53, 12 July 2016 (UTC)
- @Wikitiki89: The raw page-view statistics are here: [2]. There are a few other forms in the same place, with various characteristics. Isomorphyc (talk) 14:06, 12 July 2016 (UTC)
- For reference, I have for the time being reverted the changes from this experiment. Although it made searching for Greek words significantly easier, I am currently looking at other ways to interface with Wiktionary's CirrusSearch extension to produce measurable results extensible to other Romanisations. Thank you all for your discussion of these matters. Isomorphyc (talk) 15:23, 6 August 2016 (UTC)
Using template l to link to English entries
[edit]FYI, I created Wiktionary:Votes/2016-07/Using template l to link to English entries.
Let us postpone the vote as much as discussion requires. --Dan Polansky (talk) 08:21, 6 July 2016 (UTC)
- Imagine me taking off my surgeon's mask as I ask: But...why? Korn [kʰũːɘ̃n] (talk) 10:23, 6 July 2016 (UTC)
- Because there may be better ways to do this. E.g. writing the language code "en" is cumbersome and hinders readability. 99.99% of the words linked on the definition lines are to English words, so, we may as well use "en" as a default. But given the stupid order of parameters (language code first? Really?), that is not possible. — Dakdada 10:47, 6 July 2016 (UTC)
- I suggested a separate proposal to address the problem you just mentioned. See: Wiktionary talk:Votes/2016-07/Using template l to link to English entries#Separate proposal. --Daniel Carrero (talk) 10:56, 6 July 2016 (UTC)
- I meant "why is this proposed?" not "why is the vote postponed?" And I prefer templates to take all non-optional parameters first. Korn [kʰũːɘ̃n] (talk) 12:39, 6 July 2016 (UTC)
- Ah, sorry I misunderstood. — Dakdada 15:58, 6 July 2016 (UTC)
- I suggested a separate proposal to address the problem you just mentioned. See: Wiktionary talk:Votes/2016-07/Using template l to link to English entries#Separate proposal. --Daniel Carrero (talk) 10:56, 6 July 2016 (UTC)
- Because there may be better ways to do this. E.g. writing the language code "en" is cumbersome and hinders readability. 99.99% of the words linked on the definition lines are to English words, so, we may as well use "en" as a default. But given the stupid order of parameters (language code first? Really?), that is not possible. — Dakdada 10:47, 6 July 2016 (UTC)
- @Dan Polansky, you said here: "Rationale" - "To be entered by supporters." Are you planning to vote "Oppose"? --Daniel Carrero (talk) 16:22, 6 July 2016 (UTC)
Bot replace Template:etyl with Template:cog?
[edit]Is it ok if I run a bot to replace all instances of "{{etyl|xx|-}}
{{m|xx|...}}
" with "{{cog|xx|...}}
", currently categorised in Category:etyl cleanup no target? —CodeCat 20:05, 6 July 2016 (UTC)
- I would certainly support this. —JohnC5 20:11, 6 July 2016 (UTC)
- How do you know all uses of
{{etyl|xx|-}}
are intended to be cognates? DTLHS (talk) 20:15, 6 July 2016 (UTC)- What else could they be? —CodeCat 20:17, 6 July 2016 (UTC)
- I support and appreciate the change. Please go ahead. --Daniel Carrero (talk) 20:18, 6 July 2016 (UTC)
- (e/c) And why would it matter? The template doesn't have to only be used for cognates, even if cognates are its main purpose. --WikiTiki89 20:19, 6 July 2016 (UTC)
- I could limit it to just etymology sections for now, if that's better. —CodeCat 20:20, 6 July 2016 (UTC)
- Yes, probably limit it to etymology sections (kampung for example) DTLHS (talk) 20:25, 6 July 2016 (UTC)
- (e/c) Theoretically,
{{etyl}}
should only have been used in etymology sections anyway. I would like to know what kinds of other places it's used in. --WikiTiki89 20:26, 6 July 2016 (UTC)- I've used
{{etyl}}
in ====Usage notes==== sections, such as in "compare with English [SOME OTHER TERM]..." constructions. ‑‑ Eiríkr Útlendi │Tala við mig 20:32, 6 July 2016 (UTC)
- The intention of the template is not clear from its documentation. The existence of second argument - to explicitly avoid categorization lends this template to broader use than just in ===Etymology=== sections. Even the name etyl, described as coming from etymological language, suggests that this could be used in any case where an editor seeks to specify a given term's language.
- Before embarking on any bot-driven overhaul of how
{{etyl}}
is used, I would strongly recommend first finding out where and how it is actually used, based on the hard data available in a dump, rather than just relying on our own individual assumptions. ‑‑ Eiríkr Útlendi │Tala við mig 21:43, 6 July 2016 (UTC)- At worst, the bot run will convert some incorrect uses of
{{etyl}}
to equally incorrect uses of{{cog}}
. We can do such an analysis after the run, too. —CodeCat 21:46, 6 July 2016 (UTC)
- At worst, the bot run will convert some incorrect uses of
- I've used
- I could limit it to just etymology sections for now, if that's better. —CodeCat 20:20, 6 July 2016 (UTC)
- What else could they be? —CodeCat 20:17, 6 July 2016 (UTC)
- @CodeCat, so long as the xx in both
{{etyl}}
and the following{{m}}
match, this is fine by me. There have been cases where{{etyl|ojp|...}}
is followed by{{m|ja|...}}
due to the unresolved status of Old Japanese entries, both here and in Japanese lexicography in general (i.e. there isn't as clear a distinction between the two; many bigger dictionaries of modern JA include obsolete terms that could technically qualify as OJP, and current terms that have specific OJP or Classical senses). ‑‑ Eiríkr Útlendi │Tala við mig 20:36, 6 July 2016 (UTC)- I would say it's better to put something like "from Old Japanese (compare Japanese XYZ)". --WikiTiki89 20:39, 6 July 2016 (UTC)
- So far as I know, we do not (yet) have any OJP entries. We do have plenty of JA entries, and in some cases, the JA and OJP differ mainly in conjugation patterns and idiomatic usage. Monolingual JA dictionaries will often put OJP and JA content into a single entry, indicating obliquely in the header that the older forms have a different conjugation. "Compare" doesn't quite seem correct in these cases. ‑‑ Eiríkr Útlendi │Tala við mig 21:54, 6 July 2016 (UTC)
I've made a few test edits with the script, it seems to work ok. All those Malayo-Polynesian etc entries that are listed right at the start of the category use {{etyl}}
in descendants sections, we'll probably want to fix them. There's a few I already encountered that use it in other sections but could theoretically replace it with {{cog}}
, such as in -ains. —CodeCat 20:43, 6 July 2016 (UTC)
- But like I said,
{{cog}}
is not any more wrong than{{etyl}}
there. So it's safe to replace them. --WikiTiki89 20:50, 6 July 2016 (UTC)- Ok, I'll remove the section restriction. —CodeCat 20:52, 6 July 2016 (UTC)
- Query: does
{{cog}}
add any categories? If so, which ones? If not, should it?
- I'm also a little puzzled by the apparently cavalier attitude for using
{{cog}}
for relationships that are not cognates. This confuses things unnecessarily. ‑‑ Eiríkr Útlendi │Tala við mig 21:59, 6 July 2016 (UTC)- It does not add categories. Also, we've used etyl cavalierly to mark relationships that are not etymological; this is a step in the right direction. —Μετάknowledgediscuss/deeds 22:05, 6 July 2016 (UTC)
- And it is unclear to me how changing to
{{cog}}
"is a step in the right direction". Although I disagree that use of{{etyl}}
has been cavalier, even granting that, swapping one apparent confusion for another does not strike me as progress. I note also that the documentation for{{cog}}
explicitly states that this template is intended to mark cognate relationships, and that it is intended for use solely in ===Etymology=== sections. What's described in this thread here goes well beyond that stated scope. ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 6 July 2016 (UTC)- Please look at how the templates work. The advantage of
{{cog}}
is that it links to both the language and the term, thus serving the purposes of two templates (less markup means easier for editors to read and easier for bots to manipulate, while eliminating mismatch error). —Μετάknowledgediscuss/deeds 22:44, 6 July 2016 (UTC) - If the naming of the template is an issue, we can create an exactly identical template with a different name and documentation, or we can rename this one. Either way, this doesn't affect the bot run. —CodeCat 23:10, 6 July 2016 (UTC)
- The functionality (combining two templates into one with essentially the same output → less markup) I agree with. The naming of
{{cog}}
and its documentation make me think that cog as a name here is inappropriate for this current use, which has nothing to do with cognates. If a different template with a different name could be used for this instead of{{cog}}
, my current concerns would be resolved. ‑‑ Eiríkr Útlendi │Tala við mig 01:17, 7 July 2016 (UTC)
- The functionality (combining two templates into one with essentially the same output → less markup) I agree with. The naming of
- Please look at how the templates work. The advantage of
- And it is unclear to me how changing to
- It does not add categories. Also, we've used etyl cavalierly to mark relationships that are not etymological; this is a step in the right direction. —Μετάknowledgediscuss/deeds 22:05, 6 July 2016 (UTC)
- Support, and I greatly appreciate this. —Μετάknowledgediscuss/deeds 22:05, 6 July 2016 (UTC)
- So, for example in almagra, you want to change the current etymology from "From Spanish almagra, almagre, from Arabic المُغْرَة (al-muḡra, “red clay or earth”)" into "From Spanish en almagra, almagre, from Arabic en (en) المُغْرَة (al-muḡra, “red clay or earth”)", and you want to remove the categories Category:English terms derived from Spanish and Category:English terms derived from Arabic altogether?
- Why don’t you want to keep those categories, and what are you going to do with the en and en (en)? (note: en is a language code, so it will vary according to the language.) —Stephen (Talk) 07:33, 7 July 2016 (UTC)
- I think you don't understand this. Please look carefully at what CodeCat wrote at the very beginning, so you can see that it doesn't apply to uses of
{{etyl}}
to show etymological derivation. —Μετάknowledgediscuss/deeds 07:42, 7 July 2016 (UTC)- Oh, just in Category:etyl cleanup no target, not necessarily "replace all instances of". —Stephen (Talk) 07:58, 7 July 2016 (UTC)
- I think you don't understand this. Please look carefully at what CodeCat wrote at the very beginning, so you can see that it doesn't apply to uses of
I've run the bot now, and it did a good number of edits, but there's still 5000 cases remaining. I spotted some regular occurring patterns that a bot could also fix up:
{{etyl|xx|-}}
''[[foobar]]''
, optionally with a language as the anchor.{{etyl|xx|-}}
{{l|xx|...}}
, presumably added by editors who don't know the difference between the templates.{{etyl|xx|-}}
:{{l|xx|...}}
in a descendants section. Seems to occur mostly in Malayo-Polynesian languages.
—CodeCat 16:03, 8 July 2016 (UTC)
- I don't understand the difference between the templates either. What do
{{l}}
and{{m}}
actually do differently? Korn [kʰũːɘ̃n] (talk) 17:14, 8 July 2016 (UTC)- See Help:Language sections#Linking to language sections. --Daniel Carrero (talk) 17:16, 8 July 2016 (UTC)
{{m}}
italicizes Latin-script terms and transliterations of non-Latin script terms.{{l}}
does not. --WikiTiki89 18:18, 8 July 2016 (UTC)- Strictly speaking, it doesn't italicise, but it tags it with the "mention" CSS class, and the CSS then gives it italic formatting. The distinction matters when users start making custom CSS. —CodeCat 18:37, 8 July 2016 (UTC)
- @Korn See also Wiktionary:Style_guide#Styling_templates, a section I wrote which shows all (or most) of the styling templates and where to use them. In short,
{{l}}
is used for lists and{{m}}
in running text, and as mentioned, the latter italicizes (usually) but the former doesn't. Benwing2 (talk) 20:56, 8 July 2016 (UTC)
- @Korn See also Wiktionary:Style_guide#Styling_templates, a section I wrote which shows all (or most) of the styling templates and where to use them. In short,
- Strictly speaking, it doesn't italicise, but it tags it with the "mention" CSS class, and the CSS then gives it italic formatting. The distinction matters when users start making custom CSS. —CodeCat 18:37, 8 July 2016 (UTC)
- See Help:Language sections#Linking to language sections. --Daniel Carrero (talk) 17:16, 8 July 2016 (UTC)
@CodeCat I have thought also about running a bot to convert instances of LANG {{m|xx|...}}
to {{cog}}
, where LANG and xx agree, e.g. Serbo-Croatian {{m|sh|...}}
. This has to be done carefully; one idea is to look for lists of terms (e.g. LANG1 {{m|xx|...}}, LANG2 {{m|yy|...}} ... and LANGN {{m|zz|...}}
, probably with additional smarts to allow for parenthesized terms in the list) and only convert them when the immediately preceding text says "[Cc]ompare" or "[Cc]ognate with" or certain other expressions. Benwing2 (talk) 21:03, 8 July 2016 (UTC)
- The smarter it is, the more potential there is for errors or oversights - don't write code smarter than yourself, and don't overestimate how smart you are. So I'd prefer simpler heuristics if possible. Limiting it to Etymology sections is a good start. —CodeCat 21:25, 8 July 2016 (UTC)
- Sorry, I definitely meant it to be limited to Etymology sections; that's clear. As for the rest of it, I don't think this is terribly over-clever code, and I've written bot code like this before without too much problem. You just have to be careful and review a bunch of the subs (before actually saving anything) to make sure it's behaving like you want. Benwing2 (talk) 23:32, 8 July 2016 (UTC)
New abuse filter for canned edit summaries?
[edit]Wikipedia has a list [3] of edit summaries commonly used by vandals: Added content; Added; Fixed typo; Typo; Fixed grammar; Grammar; I made it better. I don't know whether there are any specific vandalism tools or help-sheets out there that use these phrases, but I've seen them (identically, with the initial capital) around Wiktionary too. Perhaps an abuse filter to tag them is in order? Equinox ◑ 21:00, 7 July 2016 (UTC)
- The reason vandals use these edit summaries is because normal editors use them too. I'm not sure how effective it would be to tag them. --WikiTiki89 21:37, 7 July 2016 (UTC)
- I'm not sure that normal editors do use them. I've never seen "Added content" on a legit edit, but often on bad ones. Equinox ◑ 21:45, 7 July 2016 (UTC)
- Maybe not "added content", but "fixed typo" and "fixed grammar" are certainly used by normal editors. --WikiTiki89 21:48, 7 July 2016 (UTC)
- While "added content" is usually a red flag, I do see a small number of good edits all the time. The same with "fixed typo", though the size-change criterion is a helpful- but not infallible- added indicator. Just plain "Fixed" is somewhat reversed in terms of vandalism-to-good-edit ratios. On the other hand, I've never seen a good edit accompanied by "I made it better"- or by anything using the pronoun "I". By the way, I think the reason "added content" is used so much by vandals is that it's very vague and seems innocuous. While we're at it, I think any edit comment that includes lol, lulz, or variants should definitely be flagged, too. Chuck Entz (talk) 03:43, 8 July 2016 (UTC)
- Maybe not "added content", but "fixed typo" and "fixed grammar" are certainly used by normal editors. --WikiTiki89 21:48, 7 July 2016 (UTC)
- I'm not sure that normal editors do use them. I've never seen "Added content" on a legit edit, but often on bad ones. Equinox ◑ 21:45, 7 July 2016 (UTC)
- I have seen "fixed typo" used by so many vandals on WP. I don't recall whether I've seen it here or not. We could tag edits by new users that used them and check the tag log after a while and see whether it was catching enough vandalism (and a high enough ratio of vandalism to helpful edits) to be worth continuing to tag. - -sche (discuss) 22:23, 7 July 2016 (UTC)
- Typos are small, so any edit glossed "fix typo" that changes more than, say, six bytes is very suspect. Equinox ◑ 22:30, 7 July 2016 (UTC)
- I use 'typo', 'fixed typo', 'grammar mistake', 'general improvements' and 'expanded'/'extended' with great regularity. And while Germanic, I'm not a vandal. I just make a lot of typos. If you don't want to become scholars of my works, maybe the tagging should exclude autopatrollers. Korn [kʰũːɘ̃n] (talk) 22:41, 7 July 2016 (UTC)
- Tags If a certain tag is being flagged too often, it can easily be turned off. It's worth trying these to see if they are useful. —Justin (koavf)❤T☮C☺M☯ 03:30, 8 July 2016 (UTC)
- Good idea. The filter can combine the edit summary with other observable characteristics of the edit, like, quoting Equinox, 'so any edit glossed "fix typo" that changes more than, say, six bytes is very suspect'.
- A rather unrelated idea: I would prevent anons from making edits that remove more than, say, 10 bytes. --Dan Polansky (talk) 09:36, 10 July 2016 (UTC)
- @Dan Polansky: I think that a policy about restricting the bytes that an IP can edit is hostile to users who don't wish to have an account. We should welcome anyone to edit, even if he doesn't want a pseudonym associated with this edits. —Justin (koavf)❤T☮C☺M☯ 14:04, 10 July 2016 (UTC)
- It's about bytes removed, not bytes added. --Dan Polansky (talk) 14:10, 10 July 2016 (UTC)
- @Dan Polansky: I think that a policy about restricting the bytes that an IP can edit is hostile to users who don't wish to have an account. We should welcome anyone to edit, even if he doesn't want a pseudonym associated with this edits. —Justin (koavf)❤T☮C☺M☯ 14:04, 10 July 2016 (UTC)
- That sounds much too restrictive to me: there are plenty of good reasons to remove material, e.g. cutting down waffle/verbosity, or fixing the vandalism of other anons. Equinox ◑ 14:20, 10 July 2016 (UTC)
- @Dan Polansky, Equinox: For that matter, someone may make a very helpful edit by removing a lot of text and moving it to the Citations namespace or replacing it with a template. I agree that it can still trigger a tag but not restricting the ability to do it altogether. For that matter, someone can remove 100kb of data and then add back 106kb of junk. The absolute difference in data is not a good metric of quality, hence just tagging it rather than stopping it altogether. It still requires a lot of human discretion. —Justin (koavf)❤T☮C☺M☯ 18:11, 10 July 2016 (UTC)
- That sounds much too restrictive to me: there are plenty of good reasons to remove material, e.g. cutting down waffle/verbosity, or fixing the vandalism of other anons. Equinox ◑ 14:20, 10 July 2016 (UTC)
Old Gutnish
[edit]I've noticed a few redlinks to Old Gutnish in Proto-Germanic entries. Does anyone know if it is distinctive enough to have its own code, or should it be merged into Old Norse/modern Gutnish? KarikaSlayer (talk) 00:48, 8 July 2016 (UTC)
- It's considered a dialect of Old Norse, but quite distinct. Old Norse is considered to be split into three main dialect areas, East, West and Gutnish. Some of the idiosyncracies of Old Gutnish survive into modern Gutnish, including in particular the triphthong jau. —CodeCat 01:02, 8 July 2016 (UTC)
Compact Language Links enabled in this wiki today
[edit]Compact Language Links has been available as a beta-feature on all Wikimedia wikis since 2014. With compact language links enabled, users are shown a much shorter list of languages on the interlanguage link section of an article (see image). Based on several factors, this shorter list of languages is expected to be more relevant for them and valuable for finding similar content in a language known to them. More information about compact language links can be found in the documentation.
From today onwards, compact language links has been enabled as the default listing of interlanguage links on this wiki. However, using the button at the bottom, you will be able to see a longer list of all the languages the article has been written in. The setting for this compact list can be changed by using the checkbox under User Preferences -> Appearance -> Languages
The compact language links feature has been tested extensively by the Wikimedia Language team, which developed it. However, in case there are any problems or other feedback please let us know on the project talk page. It is to be noted that on some wikis the presence of an existing older gadget that was used for a similar purpose may cause an interference for compact language list. We would like to bring this to the attention of the admins of this wiki. Full details are on this phabricator ticket. Thank you. On behalf of the Wikimedia Language team:--Runa Bhattacharjee (WMF) (talk) 03:12, 8 July 2016 (UTC)
Company names in Russian?
[edit]@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, KoreanQuoter I'm pretty sure Wiktionary's policies don't allow things like company names e.g. Газпром (Gazprom), Лукойл (Lukoil), Роснефть (Rosneft) to be inserted as lemmas. For languages like Russian this may be slightly problematic as it's often useful to be able to give the pronunciation, stress, gender, declension etc., some or all of which may be unpredictable. Some but not all of this information is found in Wikipedia. Any solution? Benwing2 (talk) 04:44, 8 July 2016 (UTC)
- I think this would fall under WT:BRAND, which in practice allows major company/brand names to be included (basically if they might be mentioned in passing, without talking specifically about the commerce side). Equinox ◑ 04:47, 8 July 2016 (UTC)
- Thanks for pinging but it's not up to me. I find some company names interesting as well but they get deleted, e.g 宏碁 (Hóngqí, “Acer”). --Anatoli T. (обсудить/вклад) 06:11, 8 July 2016 (UTC)
- I don't think there is anything special about Russian with regard to company and brand names. Even in English, it is not always clear from the spelling these names how they are supposed to be pronounced. The reasons for excluding most company and brand names from English and other languages apply equally to Russian as well. But like Equinox said, some of these names can still pass. --WikiTiki89 14:33, 8 July 2016 (UTC)
Future of Wiktionary and its interface
[edit]I want to ask Wiktionary users to think about this more seriously. Now I myself don't have any alternative in my mind right now (I only know we probably need something new from scratch), but I think we should bring discussions like this up every now and then, even if we don't make any apparent progress right now. It's a shame that many of us users don't bother to think about what the project should look like in the long run, while it is relevant to our very own efforts.
Wiktionary is not user-friendly, and looks relatively time-consuming to edit for most people, especially for beginners (note the fact that the bulk of our users are either quite active or quite inactive). I feel what we currently have is not what our editors deserve. Flexibility is good when you are at the beginning of things, but I think our project has reached a good degree of matureness to move toward finally deciding about a less "flexible" and more systematic way to take and show information. I noticed this when I saw we are trying to make even Etymology, Descendants and Pronunciation sections as such.
P.S. Consider Wikidata as well. --Z 17:27, 9 July 2016 (UTC)
- Our hands are tied. We don't have control over anything of importance in terms of how this website works, and we have very few people who are willing and able to put enough time into new user-friendly JS or beginner-friendly templates. Wikidata is a must for interwikis, but otherwise I have not seen much prospect for its utility. From my perspective, all I can do is keep adding content. —Μετάknowledgediscuss/deeds 18:25, 9 July 2016 (UTC)
- From what I can gather, Wikidata would rather move ahead with their own pseudo-dictionary project rather than touch anything on this site. DTLHS (talk) 18:28, 9 July 2016 (UTC)
- @DTLHS: Can you tell me what you mean by this? Can you point me to some relevant pages? —Justin (koavf)❤T☮C☺M☯ 20:47, 9 July 2016 (UTC)
- See the wikidata page on "water". There are many translations, with glosses defined in the target language. They are attempting to do many of the same things we do. Notice that there are links to every other Wikimedia project but not Wiktionary. DTLHS (talk) 21:01, 9 July 2016 (UTC)
- @DTLHS: Well, see my comment at d:Wikidata:Project chat. The problem is that interwiki links mean two different things only for this project but not for others. For every other project, an interwiki link would be the same idea (wikt:en:foot to wikt:es:pie) but for this one, it would also be good to link to the same idea (a translation) but also the same term (so from wikt:en:foot to wikt:es:foot). —Justin (koavf)❤T☮C☺M☯ 03:31, 10 July 2016 (UTC)
- @Justin (koavf), what you propose represents a huuuuge problem -- the correct translation of a single term in isolation, into a single term in some other language, is often flat-out impossible. Which sense of water would you link to in the other languages? What about metal? Even compound terms can introduce ambiguity -- for heavy metal, would you link to the corresponding entry for the chemical sense, or the musical sense? This impossibility is precisely why Wiktionary interwiki links only target the corresponding entry for that same spelling in that same script. ‑‑ Eiríkr Útlendi │Tala við mig 00:38, 13 July 2016 (UTC)
- @Eirikr: Agreed--it's very difficult to do this. This is why OmegaWiki uses a different database entry per meaning, whereas we have an entirely different page per term/character/etc. —Justin (koavf)❤T☮C☺M☯ 01:24, 13 July 2016 (UTC)
- Even meanings are problematic: for every clear, discrete concept like a chemical substance or a species, there are a hundred others that are really clouds of interacting sub-concepts, with the relative importance of each sub-concept influenced by context or personal style, or phases of the moon... or something. These terms don't quite mean exactly the same thing any two times they're used, and the ambiguity can be just as important as the core meanings. This is the stuff of poetry and humor, and pinning it down precisely will often kill it. Translation is an art, not a science, and even a mega-corporation like Google with all the resources it puts into Google Translate can't make computer translation accurate. It's been a few decades since I took Semantics, but I don't think this is going to be solved any time soon- we just keep progressively halving the distance to infinity. Chuck Entz (talk) 02:33, 13 July 2016 (UTC)
- @DTLHS: That's not intended to be a dictionary in any form. Needing labels corresponding to topics that need to be used by Wikipedia is incidental, and Wikidata does not currently include any effort to build a database of lexical/linguistic content outside of that. However, there are plans to build one in the future, so that the Wiktionary projects can make use of it. See the most recent proposal for Wiktionary-Wikidata integration. These efforts have not yet begun. --Yair rand (talk) 20:29, 17 July 2016 (UTC)
- @DTLHS: Well, see my comment at d:Wikidata:Project chat. The problem is that interwiki links mean two different things only for this project but not for others. For every other project, an interwiki link would be the same idea (wikt:en:foot to wikt:es:pie) but for this one, it would also be good to link to the same idea (a translation) but also the same term (so from wikt:en:foot to wikt:es:foot). —Justin (koavf)❤T☮C☺M☯ 03:31, 10 July 2016 (UTC)
- See the wikidata page on "water". There are many translations, with glosses defined in the target language. They are attempting to do many of the same things we do. Notice that there are links to every other Wikimedia project but not Wiktionary. DTLHS (talk) 21:01, 9 July 2016 (UTC)
- @DTLHS: Can you tell me what you mean by this? Can you point me to some relevant pages? —Justin (koavf)❤T☮C☺M☯ 20:47, 9 July 2016 (UTC)
- @Metaknowledge: If you have any ideas for some scripts to make things more user-friendly, I'd be glad to write the code.
- Re Wikidata, I think it will be very useful for storing things like hard-to-manage transcriptions (those which are currently scraped out of entries by templates), pronunciations (which could then be shared across projects), quotations, and maybe etymological data. Eventually, we'll probably be able to move basically everything over there without disrupting everyone's workflows, but that's probably many years away. --Yair rand (talk) 20:39, 17 July 2016 (UTC)
- @Yair rand: Right now most of my wishlist is for editors, not users, but if that offer stands, I'll try to remember to tell you when something comes up. And I'm still suspicious of our future at Wikidata — if they really wanted to do this the right way in the foreseeable future, they'd try to get us on board and invite discussion here. —Μετάknowledgediscuss/deeds 20:45, 17 July 2016 (UTC)
- From what I can gather, Wikidata would rather move ahead with their own pseudo-dictionary project rather than touch anything on this site. DTLHS (talk) 18:28, 9 July 2016 (UTC)
- Is there any context for this? Because I've little idea what exactly the topic is. Korn [kʰũːɘ̃n] (talk) 18:28, 9 July 2016 (UTC)
- There is a possible way forward. We could write a parser for the syntax of the site as it currently exists (this is not easy!). If we have a parser, we can represent the data contained here independently of our own peculiar conventions, and bootstrap our way towards something better. Without a parser we can only make incremental changes, slowly, and mostly by hand. DTLHS (talk) 18:34, 9 July 2016 (UTC)
- The wikidata "water" page addresses part of definition 1 at [[water]]. I think Wiktionary would a useful, large set of ambiguations (polysemies) to get them out of what looks like a narrow set of concepts that they are so far covering. Perhaps they should work with WordNet or other semantic net until they are ready to work with the complications of natural languages. DCDuring TALK 21:22, 9 July 2016 (UTC)
Hi all, we do talk about this topic in the French Wiktionary. That why we went to Wikidata to add Translate into all conversation about Wiktionary. Well, I see two different direction. First is the technical improvement and it may pass by a lexical database, and we are the ones that have to discuss this, in a way to scope what can be integrate. Second aspect is interface, and Visual Editor is the way in my opinion. Ok, I know, this tool is far from perfect now, as it was at the beginning for Wikipedia. Well, we can change that. Recently I wrote an idea I had during Wikimania on Visual Editor talk page and it result on a proposal on phabricator. It's something small, and adapt this tool to each project will be a long term project, but I think we can do it Noé (talk) 12:17, 10 July 2016 (UTC)
Renaming male -> masculine, female -> feminine in {{given name}}
?
[edit]Enoshd (talk • contribs) suggested this and refers to w:Sex and gender distinction. He thinks both the text and categories should change. Changing the text is easy but renaming the categories will require a bot probably. Comments? Benwing2 (talk) 22:37, 9 July 2016 (UTC)
- FYI: Wiktionary:Votes/pl-2010-01/Renaming given name appendixes, Wiktionary:Votes/2009-12/Masculine and feminine given names. --Dan Polansky (talk) 09:04, 10 July 2016 (UTC)
- Pointless; we already have a sense at male that says "Belonging to the masculine gender (social category)." Equinox ◑ 12:20, 10 July 2016 (UTC)
- We should make a distinction between grammatical and personal gender. These names' gender is personal, and possibly also grammatical but not necessarily (diminutives in Dutch are neuter, names are no exception). Since we already have "feminine nouns" categories and such for some languages, we should use "male" and "female" here to distinguish them. —CodeCat 12:58, 10 July 2016 (UTC)
- Cecil is a male given name, but it isn't very masculine. Masculine (and feminine) would create ambiguity. Renard Migrant (talk) 14:43, 15 July 2016 (UTC)
- Are you calling Cecil Rhodes a sissy? —Aɴɢʀ (talk) 15:27, 15 July 2016 (UTC)
- Cecil is a male given name, but it isn't very masculine. Masculine (and feminine) would create ambiguity. Renard Migrant (talk) 14:43, 15 July 2016 (UTC)
- +1 to what CodeCat said. For example, cailín is a masculine word, but a female given name (google e.g. Cailin Mcloughlin). Also, as Equinox points out, "male" and "female" are sufficiently polysemous to be acceptable. - -sche (discuss) 19:36, 15 July 2016 (UTC)
- The Sex and gender distinction is irrelevant here, since no language actually makes that distinction (if you look at our entries for sex and gender, both words can refer to both concepts). Like others have said, there is a completely separate distinction relevant to us that we call grammatical gender vs. whateverthehellyouwanttocalltheotherone (and both "sex" and "gender" fit into the latter). It is true that in many languages the latter often strongly influences the former, but we still need to maintain the distinction, and the way we've been doing that is by using masculine and feminine for the former and male and female for the latter. --WikiTiki89 19:50, 15 July 2016 (UTC)
Do we need to mark accent in Hebrew words? Also, why don't why add redirects for Hebrew romanizations? (among other notes)
[edit]1. The accents and stress are very predictable in Hebrew, to the point of graphic accents being unnecessary.
2. Hebrew words are often understood in their romanized form(s) and aren't easy to search without a Latin keyboard. Why not do for Gothic and Japanese for Hebrew?
3. Do bekadgefat letters need to be mentioned with a tag? There are only six, and it would be redundant to mark every single word.
4. Nobody uses the CCaCtem/ CCaCten meter in the past pa'al due to analogy, nor does anyone use feminine plurals outside of the present tense. Shouldn't space be dedicated to gerunds and in the case of pa'al, past participles?
5. Shouldn't irregular verbs that are technically not weak like "konen" be marked?
6. Why is tzade always ts not tz? Nobody spells the language like that. Nor does anyone use kh for heyt, altho if there is a /X/~/X\/ separation in the orthography, I get having a /?/~/?\/ separation too, using ' and `, respectively. Maybe there should be precise and common transliterations? — This unsigned comment was added by Zontas (talk • contribs).
- @Zontas: Re #2: For what it's worth, the Japanese actually use Romanji. I agree that we should have redirects for these, though or pages which read something like "Latinization of [term]" and which are maybe edit-protected. —Justin (koavf)❤T☮C☺M☯ 22:38, 13 July 2016 (UTC)
- I don't know that Gothic has ever been published in anything besides the Latin script. Having a physical keyboard is becoming less and less important; mobile screen keyboards are easily changed, and virtual computer keyboards can be installed on modern operating systems.--Prosfilaes (talk) 23:08, 25 July 2016 (UTC)
- I doubt that most people who are looking up Hebrew words know romanizations well enough to know what to type in the search box. This is especially true since there are a ton of different romanizations. I'm not Jewish, but I grew up in a neighborhood with a large Jewish population. I remember seeing "Happy Hannukka|Hanukkah|Chanukah|Hanukah|Chanukka"- and the Hanukkah entry has a dozen more variations. As to how people look things up: I suspect the most common method for Hebrew would be copypasting from a document or a web page. Either that, or they use the Hebrew keyboard that's offered in that little dialog that shows up in the lower-right corner when you type into any Wikimedia input field- including the search box. Chuck Entz (talk) 02:16, 26 July 2016 (UTC)
- My answer:
- That's not quite true. Most of the time the accent is predictable (in native words), but in many cases it is not. Minimal pairs include תואר / תֹּאַר (tó'ar, “title”) vs. תואר / תֹּאַר (to'ár, “he was described”) and בָּנוּ (bánu, “in us, also they understood”) vs. בָּנוּ (banú, “they built”). I agree that including accents on all words is superfluous, it would have been better to mark accents only when not on the final syllable. But marking the accent on all words is what was decided upon before my time here.
- What do you mean by "are often understood"? Gothic and Japanese each have their own peculiarities that make romanizations desirable as entries. In both cases there were votes in order to allow them to be treated as exceptions. Our default policy is that romanizations are not words and do not deserve entries and I don't think Hebrew is special enough to be treated differently. Searching can be solved by using the MediaWiki software keyboard, various online keyboards, copy/pasting, enabling the Hebrew keyboard that you may not have known came with your computer, etc.
- Are you referring to the usage note we include? I agree that they are too common for this to be necessary. I don't add them, but I don't really have a problem with others adding them.
- I think our conjugation tables mention that. If not, my new conjugation module that I'm working on should address the issue. But remember that our Hebrew entries cover Hebrew from all periods, including Biblical, Mishnaic, Medieval, and Modern.
- I don't understand what you mean.
- Firstly, keep in mind that American Jews' transliterations of Hebrew words are not Hebrew. People that actually write in Hebrew do so in Hebrew letters. I also have advocated a dual romanization system for Hebrew -- a scholarly one and a modern one -- which I use sometimes in various places. We use ts and kh because that was the common practice before my time here; theoretically, it makes more sense, even though it looks weird to us American Jews who are used to tz and ch. For kh, there is also the added benefit of leaving ch free to indicate the צ׳ sound (as in צ׳יק צ׳ק (chik chak)).
- --WikiTiki89 00:35, 14 July 2016 (UTC)
- WT:REDIRECT advises against redirects and they can be confusing, especially for a Latin to a non-Latin script. Also you'll get collisions where two words, say Chinese and Hebrew have the same romanisation and an entry title can't redirect to both. Thirdly there are often rival romanisations schemes for languages, Chinese has loads, doesn't it? So one word might have seven different romanisations. Bear in mind we have search function that will picking up romanisations in an entry's text. Renard Migrant (talk) 14:40, 15 July 2016 (UTC)
- Yeah, I guess the accent isn't completely useless, but I suppose the answer is to limit usage, not remove or overdo it.
- I concede regarding romanization redirects, I guess a better idea would be adding Hebrew words to "in other scripts" sections and having two official transliterations on here (precise/ Biblical and colloquial/ Modern).
- I mean, it wastes space and isn't exactly helpful. But it's a low priority, my only goal is to make it not mandatory.
- Ahhh, I completely understand. Tho, the tables mostly mention it being rare rather than obsolete.
- Konen is a verb in pi'el that has an odd conjugation (as you can tell by the dictionary form), but it's not remarked upon for some reason, I suppose we start marking it and other odd verbs like yakhol and lamad.
- Most people who write Hebrew do use the Hebrew Script of course, but if they can't for some reason, they use Latin; And in Latin I've never seen ts used for tzade even on other Wikimedia, and kh is mostly used for khaf rather than heyt. It's common here, but about nowhere else. Plain c, which is unused, could be used for /tS/, with ch and ` becoming the (commonly enough) pharyngeal consonants. — This unsigned comment was added by Zontas (talk • contribs) at 15:32, 15 July 2016.
- (4) I would say calling them rare rather than obsolete is accurate. (5) כּוֹנֵן is not irregular, is simply a member of the sub-binyan po'el of pi'el, just like רָץ is a member of a sub-binyan of pa'al. (6) Transliterations are not meant to represent how these words are written when the Hebrew script is not available. Transliterations are meant to aid our readers in reading our entries when they don't know the script. --WikiTiki89 18:02, 15 July 2016 (UTC)
- Re #6: It could be argued that using more standard transliterations makes it easier for naive users to read. Personally ts makes more sense to me than tz but I agree that tz is more common. Benwing2 (talk) 18:33, 15 July 2016 (UTC)
- 4) I mean, I do concede it's to mark all eras of Hebrew. 5) Technically weak roots are not irregular, but I that's part of what I was referring to. Anywho, my main issue is that some sub-conjugations were never marked as having unique forms, like konen. 6). Fair enough about not using redirects, tho I still thing we should revize our main transliteration system to match common trends, and to add another for precise transliteration. There should be a Hebrew-to-Latin index. Also, what's your opinions on my responses to #1-#3? --Zontas (talk) 20:28, 16 July 2016 (UTC)
- כּוֹנֵן does not have "unique" forms. There are many verbs in this sub-conjugation. Perhaps we should categorize these sub-conjugations, but that would be difficult for some of the confusing pa'al subgroups. --WikiTiki89 14:46, 18 July 2016 (UTC)
- 4) I mean, I do concede it's to mark all eras of Hebrew. 5) Technically weak roots are not irregular, but I that's part of what I was referring to. Anywho, my main issue is that some sub-conjugations were never marked as having unique forms, like konen. 6). Fair enough about not using redirects, tho I still thing we should revize our main transliteration system to match common trends, and to add another for precise transliteration. There should be a Hebrew-to-Latin index. Also, what's your opinions on my responses to #1-#3? --Zontas (talk) 20:28, 16 July 2016 (UTC)
- Re #6: It could be argued that using more standard transliterations makes it easier for naive users to read. Personally ts makes more sense to me than tz but I agree that tz is more common. Benwing2 (talk) 18:33, 15 July 2016 (UTC)
- (4) I would say calling them rare rather than obsolete is accurate. (5) כּוֹנֵן is not irregular, is simply a member of the sub-binyan po'el of pi'el, just like רָץ is a member of a sub-binyan of pa'al. (6) Transliterations are not meant to represent how these words are written when the Hebrew script is not available. Transliterations are meant to aid our readers in reading our entries when they don't know the script. --WikiTiki89 18:02, 15 July 2016 (UTC)
New Spanish verb template backend
[edit]Automatically highlights and adds categories (in the main namespace) for any irregular forms. Can export a JSON representation of inflected forms:
I've only added it to two of the existing templates (I will need to go through the existing uses to look for unexpected parameters). Any feedback before I go further? DTLHS (talk) 00:12, 14 July 2016 (UTC)
- Also I'd like to get rid of all the individual templates and move to three templates (for -ar/-er/-ir verbs) with the pattern as the first parameter, unless there are objections. DTLHS (talk) 00:22, 14 July 2016 (UTC)
- @DTLHS: Is it necessary to even have three? Could one suffice? Or maybe one for -ar and another for -er/-ir? —Justin (koavf)❤T☮C☺M☯ 00:50, 14 July 2016 (UTC)
- It doesn't really matter to me. But it should either be 1 or 3, the -er / -ir paradigms are different. DTLHS (talk) 00:54, 14 July 2016 (UTC)
- @DTLHS: Is it necessary to even have three? Could one suffice? Or maybe one for -ar and another for -er/-ir? —Justin (koavf)❤T☮C☺M☯ 00:50, 14 July 2016 (UTC)
Right, I admit that I'm one of those people who thinks that having a page like this is essentially counterproductive, but I accept the notion that it's better to have an appendix like this than having to rid Wiktionary of protologisms on a daily basis. However, edits by one user to the Romanian section, is prompting me to take action.
My reasons:
- (1) a vast majority lack English definitions – last time I checked, we're still in the English Wiktionary. I'm uncomfortable having a disproportionately long list of made-up words lacking definitions which users of this site can understand;
- (2) an overwhelming percentage of these words are ludicrous – they don't have English equivalents and they have 0 hits on Google, Google Books, social media etc.;
- (3) most of these terms do not fulfil even basic criteria for inclusion found here:
- "[…]should meet an expressive need" – most don't.
- "follow some logic in their etymology" – not in most cases.
- "follow standards of spelling, intonation, and pronunciation in the language" – unfortunately in absurdum.
- "and should be ideally "catchy" enough to have a chance of gaining wider acceptance." – strongly no, considering that they were deleted in other Romanian projects (for instance Wikibooks) for being absurd.
I'm not going to take action unless I have a mandate to do so by the community. --Robbie SWE (talk) 11:42, 15 July 2016 (UTC)
- Zero hits isn't an issue for a protologism. I agree that the page (i) should have definitions in English and (ii) isn't very useful to anybody anyway. Equinox ◑ 13:28, 15 July 2016 (UTC)
- I don't see why we should be cleaning up the garbage dump. A garbage dump is supposed to be filled with garbage. --WikiTiki89 14:12, 15 July 2016 (UTC)
- The only real rule I'd like implemented is that anything attestable should be removed. Like Wikitiki89 says it's a rubbish dump and by the way that was its original purpose as well, to discourage people putting them in the main namespace. As for formatting, surely formatting lists of words that don't exist by very definition has to be the lowest of all priorities. But anyone wanting to do it I'm not going to try and stop them. Renard Migrant (talk) 14:37, 15 July 2016 (UTC)
- I don't see why we should be cleaning up the garbage dump. A garbage dump is supposed to be filled with garbage. --WikiTiki89 14:12, 15 July 2016 (UTC)
Ok, I see what you guys are saying and I've arrived at the same conclusion – a dump is a dump, no point in trying to organise it. Thanks for the input though! --Robbie SWE (talk) 11:51, 16 July 2016 (UTC)
Vote: Adding PIE root box
[edit]FYI, I created Wiktionary:Votes/2016-07/Adding PIE root box. Let us postpone the vote if needed, that is, as long as the discussion requires. --Dan Polansky (talk) 18:28, 15 July 2016 (UTC)
Old English Long Vowels/ Wynn/ Edh/ Orthography
[edit]I get keeping thorn and æsh around as they represent unique phonemes, but by the way we are archiving the words, wouldn't it make more sense to use <w> not wynn, and <þ> not <ð>; and also using macrons or acute accents to mark long vowels. Old English writing was semi-chaotic, granted, but most of it is archived in a pseudo-modernized uniform script, and it would be easier to find with some consistency. Not to mention we should stop posting runic words as OE abandoned the script relatively early.
--Zontas (talk) 21:52, 15 July 2016 (UTC)
- Hmm, I thought edh and thorn represented distinct phonemes? Or is that just for Icelandic? ‑‑ Eiríkr Útlendi │Tala við mig 22:10, 15 July 2016 (UTC)
- We already use w and þ primarily, as that's what dictionaries use generally. But we have a policy to allow all attested words, in the representation they were written in (as far as Unicode can represent it). So that also leaves room for the letter wynn, ð and runes. However, they should direct the user to the normalised spelling rather than being a main entry in themselves. —CodeCat 22:27, 15 July 2016 (UTC)
- @Eirikr edh and thorn are distinct in Icelandic but not OE, where they both represent a sound that was voiced when between voiced sounds and not doubled, and voiceless elsewhere. Benwing2 (talk) 23:15, 15 July 2016 (UTC)
- But that also applied, at least, to Old Norse. The difference is in the origin and distribution of [ð]: in Old English it derives exclusively from voicing of [θ], while in Old Norse it derives from that and also from Proto-Germanic [ð]. I believe even in modern Icelandic the two are in complimentary distribution, but I'm not sure. —CodeCat 23:34, 15 July 2016 (UTC)
- Actually, in Old English [ð] sometimes comes from frication of [d]. Also, I didn't know Proto-Germanic had a [ð] other than a voiced [θ]. --WikiTiki89 00:06, 16 July 2016 (UTC)
- It did, but as an allophone of /d/ much like in Spanish. —CodeCat 01:04, 16 July 2016 (UTC)
- That still doesn't mean /ð/ is a phoneme. It's predictably an allophone of /d/ and /T/. Also, I get allowing original script redirects over main entries, but nobody has mentioned whether my idea of marking long vowels (and now that I think about it, soft c and g) is to be used. --Zontas (talk) 19:54, 16 July 2016 (UTC)
- @Zontas We do support using macrons for long vowels in links and such. The normal convention here (and also for stress marks in Russian, and similar things in other languages) is that the name of the entry itself doesn't have a macron in it, but the headword does, and links should also, and the macron will automatically be stripped out when generating the underlying link. See lad#Old English for an example. If I link to it as lād or lād, the link works correctly. Benwing2 (talk) 20:06, 16 July 2016 (UTC)
- I still don't get why we can't use diacritics in the name. The length isn't just a rare thing, it's as common as it is in Latin. --Zontas (talk) 20:28, 16 July 2016 (UTC)
- We don't show it in Latin, either. Chuck Entz (talk) 20:33, 16 July 2016 (UTC)
- The reason we don't add such diacritics is that we put the entry down as attested. So the page is chosen by the spelling actually used. If the originally manuscript has the diacritics, you can make an entry at a page with the diacritics; if the diacritics are only a scholarly annotation, they only get added on the page where apt, but the pagename itself will be one without diacritics. Korn [kʰũːɘ̃n] (talk) 22:05, 16 July 2016 (UTC)
- It's different with normalised spellings though, and most old Germanic languages use normalised spellings on Wiktionary. Technically, Old English terms are never attested with the letter w (or infrequently, I'll let someone who knows more clarify that), but we have entries with w regardless because the ƿ > w normalisation is usual and normal in dictionaries, grammars and republished OE texts, and we follow this custom. If we went strictly by attestation requirements, we could never have these entries but would be required to spell them with ƿ. So technically, if we allow these changes, and propagate them to page names as well, there's nothing in principle against doing the same for macrons too. Consider that the acute accents for long vowels are part of Old Norse page names, despite not appearing in manuscripts either. —CodeCat 22:13, 16 July 2016 (UTC)
- Huh. I didn't know this distinction exists and I do not agree with it. But since it doesn't affect me, I won't complain much about it either. Korn [kʰũːɘ̃n] (talk) 23:41, 16 July 2016 (UTC)
- Also, note that, for example, for Hebrew, Arabic, and Russian, diacriticized spellings are attestable but we do not allow them as entry titles. --WikiTiki89 14:44, 18 July 2016 (UTC)
- As for Old Norse, I assume we include acute accents (long marks) in page titles because in modern Icelandic the long marks are mandatory (is that correct?), and modern Icelandic is spelled almost identically to Old Norse. Cf. the mandatory long marks in Latvian, which appear in page titles. Benwing2 (talk) 02:54, 21 July 2016 (UTC)
- Also, note that, for example, for Hebrew, Arabic, and Russian, diacriticized spellings are attestable but we do not allow them as entry titles. --WikiTiki89 14:44, 18 July 2016 (UTC)
- Huh. I didn't know this distinction exists and I do not agree with it. But since it doesn't affect me, I won't complain much about it either. Korn [kʰũːɘ̃n] (talk) 23:41, 16 July 2016 (UTC)
- It's different with normalised spellings though, and most old Germanic languages use normalised spellings on Wiktionary. Technically, Old English terms are never attested with the letter w (or infrequently, I'll let someone who knows more clarify that), but we have entries with w regardless because the ƿ > w normalisation is usual and normal in dictionaries, grammars and republished OE texts, and we follow this custom. If we went strictly by attestation requirements, we could never have these entries but would be required to spell them with ƿ. So technically, if we allow these changes, and propagate them to page names as well, there's nothing in principle against doing the same for macrons too. Consider that the acute accents for long vowels are part of Old Norse page names, despite not appearing in manuscripts either. —CodeCat 22:13, 16 July 2016 (UTC)
- The reason we don't add such diacritics is that we put the entry down as attested. So the page is chosen by the spelling actually used. If the originally manuscript has the diacritics, you can make an entry at a page with the diacritics; if the diacritics are only a scholarly annotation, they only get added on the page where apt, but the pagename itself will be one without diacritics. Korn [kʰũːɘ̃n] (talk) 22:05, 16 July 2016 (UTC)
- We don't show it in Latin, either. Chuck Entz (talk) 20:33, 16 July 2016 (UTC)
- I still don't get why we can't use diacritics in the name. The length isn't just a rare thing, it's as common as it is in Latin. --Zontas (talk) 20:28, 16 July 2016 (UTC)
- @Zontas We do support using macrons for long vowels in links and such. The normal convention here (and also for stress marks in Russian, and similar things in other languages) is that the name of the entry itself doesn't have a macron in it, but the headword does, and links should also, and the macron will automatically be stripped out when generating the underlying link. See lad#Old English for an example. If I link to it as lād or lād, the link works correctly. Benwing2 (talk) 20:06, 16 July 2016 (UTC)
- That still doesn't mean /ð/ is a phoneme. It's predictably an allophone of /d/ and /T/. Also, I get allowing original script redirects over main entries, but nobody has mentioned whether my idea of marking long vowels (and now that I think about it, soft c and g) is to be used. --Zontas (talk) 19:54, 16 July 2016 (UTC)
- It did, but as an allophone of /d/ much like in Spanish. —CodeCat 01:04, 16 July 2016 (UTC)
- Actually, in Old English [ð] sometimes comes from frication of [d]. Also, I didn't know Proto-Germanic had a [ð] other than a voiced [θ]. --WikiTiki89 00:06, 16 July 2016 (UTC)
- But that also applied, at least, to Old Norse. The difference is in the origin and distribution of [ð]: in Old English it derives exclusively from voicing of [θ], while in Old Norse it derives from that and also from Proto-Germanic [ð]. I believe even in modern Icelandic the two are in complimentary distribution, but I'm not sure. —CodeCat 23:34, 15 July 2016 (UTC)
- @Eirikr edh and thorn are distinct in Icelandic but not OE, where they both represent a sound that was voiced when between voiced sounds and not doubled, and voiceless elsewhere. Benwing2 (talk) 23:15, 15 July 2016 (UTC)
- Most of the entries do use <w> and <þ>. I think that's even what it says at WT:AANG. To be honest it doesn't make much difference since in an ideal world all the attested versions would be covered by =Alternative forms= in any case. I do not agree with using macrons in page titles though. Many of my scholarly editions of OE texts don't have them, for the good reason that they're not in the manuscripts. Ƿidsiþ 16:48, 28 July 2016 (UTC)
Advice for a Sanskrit pronunciation module
[edit]Howdy all! I've been working on a Sanskrit pronunciation module similar to those available in Latin and Ancient Greek, and I would love some advice! The temporary template may be found at {{User:JohnC5/sa-IPA}}
, a sandbox at User:JohnC5/Sandbox2, and the module at mod:User:JohnC5/Sandbox2. I've finished the basic conversion to IPA, the syllabification, the rudimentary anusvara rules, and some chronolect handling (Vedic and Classical). I was hoping someone could take a look to see if it all makes sense. Also, please suggest what needs to be added (like Abhinidhāna), what needs to be fixed, how it should look, and anything else that comes to mind. It's also possible that this is completely unnecessary and should not have been attempted in the first place (I don't think so, but I'm open to discussion). I feel like a lot more work needs to be done, but I don't yet know what the end state should be nor the acceptance criteria. Thanks! —JohnC5 07:05, 17 July 2016 (UTC)
- A possible issue is that we use the unattested base stem as the lemma for nominals, rather than any of the case forms. So there shouldn't really be any pronunciation on those entries. —CodeCat 12:49, 17 July 2016 (UTC)
- @CodeCat: I had been wondering about that problem. For nominal, this issue really only affects the desinence, and I would probably suggest using the (masculine) nominative singular as the exemplar for pronunciation. For verbal root entries, I certainly would recommend against this template's usage, but for the 3rd singular entries (e.g. गच्छति (gacchati)), this should be fine. Does this solution work? —JohnC5 17:56, 17 July 2016 (UTC)
- Speaking of गच्छति (gácchati), the module currently produces /ɡə́t͡ɕ.t͡ɕʰə.t̪i/. The realization of च्छ (ccha) seems very wrong to me. Abhinidhāna would say that the first plosive becomes unreleased. In the case of an affricate, does this mean the result would be /ɡə́t̚.t͡ɕʰə.t̪i/ and the use of च (ca) is merely a spelling convention? —JohnC5 19:06, 17 July 2016 (UTC)
- You could avoid the issue entirely by using ":" instead of doubling the sound. Chuck Entz (talk) 02:36, 18 July 2016 (UTC)
- @Chuck Entz: I've already implemented the logic for the abhinidhāna (it's actually not bad). Also I'm not clear how ":" would be used when the cluster is heterosyllabic. Thank you for the advice in any case! —JohnC5 02:48, 18 July 2016 (UTC)
- The thing is that it's not exactly [t], because it has the same point of articulation as the following affricate release. That's probably why cc was used to represent it rather than tc. —CodeCat 15:30, 18 July 2016 (UTC)
- @CodeCat: Just to make sure there's no confusion, the beginning of the Sanskrit alveolo-palatal cluster /t͡ɕ/ differs from the dental /t̪/ already. So the theoretical difference between त्छ (tcha) (which is both unattested in MW and phonologically impossible) and च्छ (ccha) would be /t̪̚.t͡ɕʰ/ vs. /t̚.t͡ɕʰ/, respectively. Are you, instead, proposing a palatal stop as the form? This would be something like /c̚.t͡ɕʰ/ for च्छ (ccha) and /ɟ̚.d͡ʑʱ/ for ज्झ (jjha). Is that what you are saying? —JohnC5 17:43, 18 July 2016 (UTC)
- You could avoid the issue entirely by using ":" instead of doubling the sound. Chuck Entz (talk) 02:36, 18 July 2016 (UTC)
- @CodeCat: Do you think the default display chronolect should be Vedic or Classical? Currently, I have it as Vedic, but that's because I like PIE. —JohnC5 02:31, 18 July 2016 (UTC)
- Classical, for sure. It's what is normally taught, what people worldwide will be familiar with. —CodeCat 15:28, 18 July 2016 (UTC)
- Can't we display both, similar to what we do for Ancient Greek? --WikiTiki89 15:35, 18 July 2016 (UTC)
- @Wikitiki89: I've updated the code so that if the Vedic and Classical pronunciation differ, both are displayed with an arrow like in
{{grc-IPA}}
. You can see that at User:JohnC5/Sandbox2. Is that sufficient? —JohnC5 17:47, 18 July 2016 (UTC)- Looks good! --WikiTiki89 17:50, 18 July 2016 (UTC)
- What are the differences between Vedic and Classical pronunciation that you're showing? I thought Classical Sanskrit had a stress accent that was placed in accordance with a rule similar to that for Latin (except that it can go back further than the antepenultimate), so I was expecting /ˈtɕən̪d̪ɽə/ for चन्द्र. Shouldn't कार्त्स्न्य have /ɑː/ (with long mark) in both Vedic and Classical? What about words whose Vedic scansion reveals one more syllable than is written (e.g. /kɑːɽt̪sniə/ for कार्त्स्न्य or /ukt̪uɑː/ for उक्त्वा – I don't know if those particular words belong to the class I'm talking about, but they illustrate the principle)? Does the system have a way of accommodating them? And don't some scholars believe intervocalic laryngeals were still around in Vedic, so that ā for example might sometimes be /əʔə/ or /ɑːʔə/ or /əʔɑː/ or /ɑːʔɑː/? Your sandbox doesn't seem to have any examples of word-final visarga; how would the module transcribe चन्द्रः? I'd expect /tɕən̪d̪ɽə́h/ in Vedic and /ˈtɕən̪d̪ɽəhə/ in Classical. —Aɴɢʀ (talk) 18:14, 18 July 2016 (UTC)
- (edit conflict) @Angr: Thanks for all the comments. As mentioned before, feel free to fiddle around in the sandbox and add things. I'm less familiar with the Classical accent; do you have more description on that matter? I had been trying to figure out some of the vowel issues. One of the main distinctions between Vedic and Classical vowels is the change of ऐ (ai) & ए (e) from /ɑːj/ & /ɑj/ ~ /əj/ to /ɑj/ ~ /əj/ & /eː/ (the same is true for औ (au) & ओ (o)). I was unsure what to do about the /ɑj/ ~ /əj/ decision (Does it vary or do scholars disagree?). Aslo, do we prefer /ə/ over /ɐ/, since the latter shows up as well. Any guidance on this matter would be greatly appreciated. You're obviously right that from a metric standpoint /ɑː/ must remain long. For the Vedic scansion matters, is there a way to predict it, or is it merely on an anecdotal basis? I can add in some functionality around that (whether the user must specify the variation, or the template generates it automatically), but I need more information. For the Vedic laryngeals, I'm not sure how best to approach this: again, the user would need to specify the specific vowel, perhaps similarly to how the Vedic accent is currently specified (with
|a=N
for the first word where N represents the syllable on which the stress occurs;|a2=
,|a3=
etc. for subsequent words). For the visarga, It was my impression that the vocalic reduplication around the visarga (/tɕən̪d̪ɽə́h/ to /ˈtɕən̪d̪ɽəhᵊ/) differed between Śākhās. I'm not saying we shouldn't list several different Śākhās' pronunciations, but I wasn't sure where Classical pronunciations fell. Thanks for reading all these questions. I didn't start this module because I claimed to know Sanskrit phonology particularly well—I did it because I thought it could and should be done and that people would tell me when I make mistakes. —JohnC5 19:35, 18 July 2016 (UTC)- @Angr: I've added the Classical accent. Could you check it? —JohnC5 03:35, 19 July 2016 (UTC)
- @JohnC5 I don't have answers to the questions I asked. I thought that stress receded to the rightmost long vowel (excluding the final syllable) and fell on the first syllable if all syllables (excluding the final syllable) had short vowels, so that svataṃtraḥ and aupadraṣṭrya would be stressed on the first syllable, but I'm not positive that's right. Wikipedia doesn't say anything at all about post-Vedic accent, and all I have to go on is my memory of the Sanskrit class I took as an undergraduate more than 25 years ago. So please don't interpret my comments above as "This is how things are, you should accommodate them" but rather "Here's something that might bear looking into, but I'm not sure of the details at all". —Aɴɢʀ (talk) 11:55, 19 July 2016 (UTC)
- @Angr: Based on everything that I've read, the Classical accent is like the w:Dreimorengesetz but extended to include the preäntepenult. So starting at the penult and moving leftward, search for the first heavy syllable unless you find the left edge of the word or the fourth-to-last syllable. I've added the vowel echoing around the visarga. I'm still not sure what to do about the laryngeals and alternate syllabification. Perhaps that can wait? Is there anything that you think must be changed before this goes into production? —JohnC5 15:10, 19 July 2016 (UTC)
- @JohnC5: I notice you're using /x/ for the visarga; is it really that and not /h/? As for laryngeals and alternate syllabification, we probably don't want them to be generated automatically from the spelling, but maybe the
{{sa-IPA}}
template could take a parameter likealtved=pāat
oraltved=uktuā
that would allow pronunciations not reflected by the spelling to be listed as alternative Vedic pronunciations. —Aɴɢʀ (talk) 10:13, 20 July 2016 (UTC)- @Angr: You are right about the visarga; it took me a while to find a good source for it though. In the case of alternative syllabification, I'm wondering about the vowel hiatus, which does occur in Sanskrit, but rarely. For uktuā, do we assume a homorganic glide to be inserted (/ˈuk.t̪u.ʋɑː/), or pure vowel hiatus (/ˈuk.t̪u.ɑː/), or a glottal stop (/ˈuk.t̪u.ʔɑː/)? Also, can we detect the distinction between a Vedic laryngeal and a Vedic resyllabification, or does the user have to insert a glottal stop like
altved=pāʔat
? —JohnC5 15:23, 20 July 2016 (UTC)- @JohnC5: I think all we know for sure is that a word could be spelled उक्त्वा and scan as three syllables in the Veda; whether it was realized as /ˈuk.t̪u.ʋɑː/, /ˈuk.t̪u.ɑː/, or /ˈuk.t̪u.ʔɑː/ is probably not really knowable at this point. Likewise all that's known for sure is that some instances of ā scan as two syllables, and comparative evidence shows that these must have been *aHa or *aHā or *āHa or *āHā in PII, but how exactly they were realized in Vedic is again probably not really knowable at this point. I don't know whether vowel hiatus ever occurs when it wasn't due to a laryngeal. I almost regret bringing these issues up now, since I know so little about the details that would help in resolving them. —Aɴɢʀ (talk) 15:38, 20 July 2016 (UTC)
- @Angr: Vowel hiatus does occur in other positions, as this paper discusses. I think that we ignore the laryngeals and alternate syllabifications, however, until such time as we find examples of these two phenomena and need to mark them. I'll gladly add the functionality, but it seems far too amorphous at the moment. I hope that someone better informed than we will come to tell us all the hottest new research in Vedic phonology, but for now, we're fine. I'll switch the visargas over to /h/ soon.
On a different note, I'm currently representing the anusvara phonemically as a nasalization of the preceding vowel before ś, ṣ, s or h and a homorganic nasal before stops, but I believe the true phonemic representation is /m/ at a morpheme boundary and /n/ morpheme-internally. The issue then becomes how to tell where morphemes end within heteromorphemic words. Should I just declare all anusvara before a space to be /m/ and all others to be /n/? Also, there's evidence that, along with nasalizing the preceding vowel, the anusvara preceding ś, ṣ, s or h lengthened the vowel too. This is fine for short vowels, but works less well for long vowels (especially once you get into Classical when the w:Pluti vowels disappeared). Should I 1) only lengthen the short vowels 2) lengthen both the short and long vowels or 3) ignore the lengthening altogether since it is not well understood? Thanks for all your commentary thus far! —JohnC5 18:33, 20 July 2016 (UTC)
- @Angr: Vowel hiatus does occur in other positions, as this paper discusses. I think that we ignore the laryngeals and alternate syllabifications, however, until such time as we find examples of these two phenomena and need to mark them. I'll gladly add the functionality, but it seems far too amorphous at the moment. I hope that someone better informed than we will come to tell us all the hottest new research in Vedic phonology, but for now, we're fine. I'll switch the visargas over to /h/ soon.
- @JohnC5: I think all we know for sure is that a word could be spelled उक्त्वा and scan as three syllables in the Veda; whether it was realized as /ˈuk.t̪u.ʋɑː/, /ˈuk.t̪u.ɑː/, or /ˈuk.t̪u.ʔɑː/ is probably not really knowable at this point. Likewise all that's known for sure is that some instances of ā scan as two syllables, and comparative evidence shows that these must have been *aHa or *aHā or *āHa or *āHā in PII, but how exactly they were realized in Vedic is again probably not really knowable at this point. I don't know whether vowel hiatus ever occurs when it wasn't due to a laryngeal. I almost regret bringing these issues up now, since I know so little about the details that would help in resolving them. —Aɴɢʀ (talk) 15:38, 20 July 2016 (UTC)
- @Angr: You are right about the visarga; it took me a while to find a good source for it though. In the case of alternative syllabification, I'm wondering about the vowel hiatus, which does occur in Sanskrit, but rarely. For uktuā, do we assume a homorganic glide to be inserted (/ˈuk.t̪u.ʋɑː/), or pure vowel hiatus (/ˈuk.t̪u.ɑː/), or a glottal stop (/ˈuk.t̪u.ʔɑː/)? Also, can we detect the distinction between a Vedic laryngeal and a Vedic resyllabification, or does the user have to insert a glottal stop like
- @JohnC5: I notice you're using /x/ for the visarga; is it really that and not /h/? As for laryngeals and alternate syllabification, we probably don't want them to be generated automatically from the spelling, but maybe the
- @Angr: Based on everything that I've read, the Classical accent is like the w:Dreimorengesetz but extended to include the preäntepenult. So starting at the penult and moving leftward, search for the first heavy syllable unless you find the left edge of the word or the fourth-to-last syllable. I've added the vowel echoing around the visarga. I'm still not sure what to do about the laryngeals and alternate syllabification. Perhaps that can wait? Is there anything that you think must be changed before this goes into production? —JohnC5 15:10, 19 July 2016 (UTC)
- @JohnC5 I don't have answers to the questions I asked. I thought that stress receded to the rightmost long vowel (excluding the final syllable) and fell on the first syllable if all syllables (excluding the final syllable) had short vowels, so that svataṃtraḥ and aupadraṣṭrya would be stressed on the first syllable, but I'm not positive that's right. Wikipedia doesn't say anything at all about post-Vedic accent, and all I have to go on is my memory of the Sanskrit class I took as an undergraduate more than 25 years ago. So please don't interpret my comments above as "This is how things are, you should accommodate them" but rather "Here's something that might bear looking into, but I'm not sure of the details at all". —Aɴɢʀ (talk) 11:55, 19 July 2016 (UTC)
- @Angr: I've added the Classical accent. Could you check it? —JohnC5 03:35, 19 July 2016 (UTC)
- (edit conflict) @Angr: Thanks for all the comments. As mentioned before, feel free to fiddle around in the sandbox and add things. I'm less familiar with the Classical accent; do you have more description on that matter? I had been trying to figure out some of the vowel issues. One of the main distinctions between Vedic and Classical vowels is the change of ऐ (ai) & ए (e) from /ɑːj/ & /ɑj/ ~ /əj/ to /ɑj/ ~ /əj/ & /eː/ (the same is true for औ (au) & ओ (o)). I was unsure what to do about the /ɑj/ ~ /əj/ decision (Does it vary or do scholars disagree?). Aslo, do we prefer /ə/ over /ɐ/, since the latter shows up as well. Any guidance on this matter would be greatly appreciated. You're obviously right that from a metric standpoint /ɑː/ must remain long. For the Vedic scansion matters, is there a way to predict it, or is it merely on an anecdotal basis? I can add in some functionality around that (whether the user must specify the variation, or the template generates it automatically), but I need more information. For the Vedic laryngeals, I'm not sure how best to approach this: again, the user would need to specify the specific vowel, perhaps similarly to how the Vedic accent is currently specified (with
- What are the differences between Vedic and Classical pronunciation that you're showing? I thought Classical Sanskrit had a stress accent that was placed in accordance with a rule similar to that for Latin (except that it can go back further than the antepenultimate), so I was expecting /ˈtɕən̪d̪ɽə/ for चन्द्र. Shouldn't कार्त्स्न्य have /ɑː/ (with long mark) in both Vedic and Classical? What about words whose Vedic scansion reveals one more syllable than is written (e.g. /kɑːɽt̪sniə/ for कार्त्स्न्य or /ukt̪uɑː/ for उक्त्वा – I don't know if those particular words belong to the class I'm talking about, but they illustrate the principle)? Does the system have a way of accommodating them? And don't some scholars believe intervocalic laryngeals were still around in Vedic, so that ā for example might sometimes be /əʔə/ or /ɑːʔə/ or /əʔɑː/ or /ɑːʔɑː/? Your sandbox doesn't seem to have any examples of word-final visarga; how would the module transcribe चन्द्रः? I'd expect /tɕən̪d̪ɽə́h/ in Vedic and /ˈtɕən̪d̪ɽəhə/ in Classical. —Aɴɢʀ (talk) 18:14, 18 July 2016 (UTC)
- Looks good! --WikiTiki89 17:50, 18 July 2016 (UTC)
- @Wikitiki89: I've updated the code so that if the Vedic and Classical pronunciation differ, both are displayed with an arrow like in
- Can't we display both, similar to what we do for Ancient Greek? --WikiTiki89 15:35, 18 July 2016 (UTC)
- Classical, for sure. It's what is normally taught, what people worldwide will be familiar with. —CodeCat 15:28, 18 July 2016 (UTC)
- I don't really like the arrow notation. I prefer it to always show with labels of what each pronunciation represents. —CodeCat 19:05, 18 July 2016 (UTC)
- Also, a few other points:
- I don't think short a was fully central, but more open, perhaps [ɐ], at least in Vedic, and long ā was still a long vowel. I don't have a source, but it seems fairly likely. In the same vein, I'd say e and o were probably [ɐi̯] and [ɐu̯] in Vedic.
- Your pronunciation also seems to treat as phonemic details that weren't. The pronunciation changes of visarga before labials and velars in Vedic was not phonemic.
- I suspect that the Vedic transcription of त्रैंश (traiṃśa) is wrong, the -ai- was probably bisyllabic.
- औपद्रष्ट्र्य (aupadraṣṭrya) may be syllabified wrong, I'd expect -dr- to be entirely in the next syllable.
- Where does the nasal /j/ in Vedic कार्त्स्न्य (kārtsnya) come from?
- The resolution of ṛ into ri was post-classical, and actually differed by dialect. Some dialects have ru or ra instead. So for Sanskrit proper, a syllabic sonorant should still be used.
- Finally, I hope that there is a parameter to disable Vedic transcriptions. For certain words, the Vedic equivalent may actually have a different spelling, so listing a Vedic pronunciation would be wrong then. —CodeCat 19:23, 18 July 2016 (UTC)
- Should I just to Classical then for the default display?
- As in my response to Angr, I had been curious about that. I'm perfectly happy to use /ɐ/. Do you also think I should use /i̯/ and /u̯/ over /j/ and /w/?
- You're quite right about the [ɸ] being phonetic. I'll need to add seperate sections for those. What would you say the underlying phoneme of the visarga is? Whenever notated as a visarga, it is [x] or [ɸ], but it is just the allophone of /s/. Should it be that?
- What would be the method of determining the bisyllabicity of such words?
- I was unsure about this. The Weerasinghe-Wasala-Gamage method of syllabification makes special cases for /-.CrV-/ and /-.CyV-/ in all other cases accept /VCCV/ which it always interprets as /VC.CV/. It does seem more sensible, however to keep the same rules as in the other cases.
- William Sidney Allen's Phonetics in Ancient India mentions Vedic turned /m/ + /j/, /l̪/, or /ʋ/ as giving /j̃j/, /l̪̃l̪/, or /ʋ̃ʋ/ respectively. He then says that this occurs only once in Classical and it affects an /n/. First of all, I realize this should be a phonetic rule again. Also is it only /m/ and /n/ or all nasals? It also makes sense that this only applies in the environment of /VN.[jl̪ʋ]V/.
- That rule was borrowed from w:Vedic_Sanskrit_grammar#Phonology. I'll remove it if you think it is necessary.
- The ability to turn of the Vedic seems prudent, and I will add it. Also, if you'd like to help coding this (or fixing my bad code), please do! —JohnC5 19:56, 18 July 2016 (UTC)
- Also, I assume that abhinidhāna would be a phonetic change, not phonemic. —JohnC5 20:17, 18 July 2016 (UTC)
- Should I just to Classical then for the default display?
As for what to do with geminate consonants, in Russian we use Cː and syllabify as if it's a single consonant, i.e. if between vowels it forms the beginning of a syllable and joined with the following vowel. This isn't perfect but it deals with affricates well. The alternative I think is to write e.g. /t.t͡ɕʰ/ (or maybe /t̚.t͡ɕʰ/), which I think will be interpreted correctly as a long affricate even if technically it might mean something else; to me, something like /c̚.t͡ɕʰ/ looks really weird and is likely to be misinterpreted. Benwing2 (talk) 02:50, 21 July 2016 (UTC)
- @Benwing2: So, to be clear, you think the current solution is sufficient? 03:34, 21 July 2016 (UTC)
- @JohnC5: Sorry, I missed your reply. I don't like the way you currently write
IPA: /ˈɡɐt͡ɕ.t͡ɕʰɐ.t̪i/, [ˈɡɐt̚.t͡ɕʰɐ.t̪i]
, I think it should sayIPA: /ˈɡɐt.t͡ɕʰɐ.t̪i/, [ˈɡɐt̚.t͡ɕʰɐ.t̪i]
or maybeIPA: /ˈɡɐ.t͡ɕʰːɐ.t̪i/, [ˈɡɐt̚.t͡ɕʰɐ.t̪i]
. Benwing2 (talk) 05:04, 27 July 2016 (UTC)- @Benwing2: Aha, that makes sense to me now. I prefer your first alternative since it still denotes phonemically the presence of the underlying de-affricated stop. I'll look into implementing that soon. Any thing else? —JohnC5 05:19, 27 July 2016 (UTC)
- @JohnC5: The only other thing is that currently when you have a ˈ mark between syllables you leave out the . separating the syllables and you might want to include it, but that's your choice. Benwing2 (talk) 05:25, 27 July 2016 (UTC)
- @Benwing2: I've made the change you requested for affricate clusters. What is the convention for stress accent markers and intersyllabic dots?
{{la-IPA}}
removes the dot, but{{grc-IPA}}
does not. —JohnC5 14:35, 27 July 2016 (UTC)- @JohnC5 Presumably there's no convention, so you can pick what you want. I think it looks better with the dot but I'm not going to insist on it. Benwing2 (talk) 16:00, 27 July 2016 (UTC)
- @Benwing2: I've made the change you requested for affricate clusters. What is the convention for stress accent markers and intersyllabic dots?
- @JohnC5: The only other thing is that currently when you have a ˈ mark between syllables you leave out the . separating the syllables and you might want to include it, but that's your choice. Benwing2 (talk) 05:25, 27 July 2016 (UTC)
- @Benwing2: Aha, that makes sense to me now. I prefer your first alternative since it still denotes phonemically the presence of the underlying de-affricated stop. I'll look into implementing that soon. Any thing else? —JohnC5 05:19, 27 July 2016 (UTC)
- @JohnC5: Sorry, I missed your reply. I don't like the way you currently write
@CodeCat, Chuck Entz, Wikitiki89, Angr, Benwing2: So... are we ready to start adding this to entries? —JohnC5 03:13, 27 July 2016 (UTC)
- I have no objection. —Aɴɢʀ (talk) 08:05, 27 July 2016 (UTC)
Template alternative case form of to show more intuitive description
[edit]I think {{alternative case form of}}
should show one of the following, depending on the letter case:
- Uppercase form of
- Lowercase form of
Thus, it would no longer show this:
- Alternative letter-case form of
Easy to implement in a module, I think.
Opinions? --Dan Polansky (talk) 13:25, 17 July 2016 (UTC)
- I think this is very sensible. You are also correct that it is fairly easy to implement. —JohnC5 17:50, 17 July 2016 (UTC)
- I agree with JohnC5--Dixtosa (talk) 18:55, 17 July 2016 (UTC)
- There may be some instances where Lua cannot determine the case. But this should be very rare. DTLHS (talk) 18:58, 17 July 2016 (UTC)
- What if the alternative casing is a mix? —CodeCat 19:05, 17 July 2016 (UTC)
- Do you have an example in mind? For special cases, the template may be updated to accept a parameter telling it to display the old "Alternative letter-case form of".
- Now looking at the proposal again, the "Uppercase form of" should probably be "Capitalized form of"; uppercase would be WORD rather than Word, I fear. --Dan Polansky (talk) 19:08, 17 July 2016 (UTC)
- I would say, the template should normally display either "Capitalized form of", "Lowercase form of", or "Uppercase form of", as with Dan P; if the case is mixed somehow or other, it should default to either "Mixed-case form of" or just "Alternative letter-case form of" as before. Note that the "Capitalized form of" detector should be smart enough to allow both Foo-bar and Foo-Bar, i.e. it should treat hyphens as spaces and allow non-initial words to either be capitalized or lowercased, and it will probably need special-casing for Dutch words like IJsland and IJsselmeer. Benwing2 (talk) 20:53, 17 July 2016 (UTC)
- I assume that the module could also detect camelcase forms and display accordingly. SemperBlotto (talk) 05:05, 18 July 2016 (UTC)
- "Alternative letter-case form of" while wholly accurate is very confusing for people not familiar with such lexicographical terms. Took me a second in fact to work it out. Renard Migrant (talk) 12:43, 23 July 2016 (UTC)
- Agreed that "alternative case" is a tad confusing. The capital/lowercase templates can display "form of" if necessary but shouldn't have the word "form" in the template name itself. DAVilla 06:08, 30 July 2016 (UTC)
Lexicography at a Crossroads
[edit]Hey. Has anyone read Lexicography at a Crossroads: Dictionaries and Encyclopedias Today? I've been flicking through it, and Wiktionary gets loads of pages dedicated to it, especially the English and Spanish parts. It's pretty interesting, and they bring up several of our flaws. --Turnedlessef (talk) 23:27, 18 July 2016 (UTC)
- Page 115: "...they also illustrate two important lexicographical implications. First, only trained English lexicographers can add and/or edit English entries, whereas these requirements are not necessary for working with Spanish ones." Page 119: "Wiktionary not only uses English as default language, but also offers much more data in the English entries than in the Spanish ones. This [...] goes against its democratic philosophy." Can't say I'm much convinced that these fellows have produced an analysis that is useful for us. Korn [kʰũːɘ̃n] (talk) 08:16, 21 July 2016 (UTC)
- Author offers his idea of what a Wiktionary entry should look on page 127. Korn [kʰũːɘ̃n] (talk) 08:30, 21 July 2016 (UTC)
- I'm no trained lexicographer, but I edit and create English entries without difficulty, and I also hold the French entries I edit to the same standard as the English ones. The book was also written some time ago, meaning that much of its content is probably out of date. From what I saw, though, there were some valid problems addressed, but they are due mainly to the incomplete state of this project rather than its nature, and will hopefully improve with time (e.g. single translations in FL entries that do not accurately represent the word's range of definitions, such as using "business" to define negocio without any glosses to clarify its meaning—a problem that entry notably still has). Andrew Sheedy (talk) 03:01, 22 July 2016 (UTC)
- Thanks for pointing this article, but it is not that relevant to Wiktionary today. I started to list this kind of publication in Meta, so I invite you to add other and specially to take a look at the ones about GLAWI Noé (talk) 13:42, 22 July 2016 (UTC)
- I added a gloss to negocio. --Turnedlessef (talk) 08:31, 23 July 2016 (UTC)
- Thanks for pointing this article, but it is not that relevant to Wiktionary today. I started to list this kind of publication in Meta, so I invite you to add other and specially to take a look at the ones about GLAWI Noé (talk) 13:42, 22 July 2016 (UTC)
- I'm no trained lexicographer, but I edit and create English entries without difficulty, and I also hold the French entries I edit to the same standard as the English ones. The book was also written some time ago, meaning that much of its content is probably out of date. From what I saw, though, there were some valid problems addressed, but they are due mainly to the incomplete state of this project rather than its nature, and will hopefully improve with time (e.g. single translations in FL entries that do not accurately represent the word's range of definitions, such as using "business" to define negocio without any glosses to clarify its meaning—a problem that entry notably still has). Andrew Sheedy (talk) 03:01, 22 July 2016 (UTC)
Proposal for Gurjar Apabhraṃśa
[edit]I'd like to propose a code for the Gurjar Apabhraṃśa, the direct ancestor of Old Gujarati, which is given a grammatical sketch by Hemacandra and is also used in a several texts of the era. The code I'd like to propose is inc-agu
(and perhaps thus as a model for other Apabhraṃśas later on as inc-axx
). I'd also like to have a name that is diacritic-less, but all the literature shows the version with diacritics. Any ideas? DerekWinters (talk) 22:16, 20 July 2016 (UTC)
- Perhaps Gurjar Apabhramsha? —Aryamanarora (मुझसे बात करो) 19:07, 26 July 2016 (UTC)
A tremendous gathering
[edit]Hi, colleagues!
I am thrilled to announce the official creation of the Wiktionary Tremendous Group !
It arose from seeds planted at Wikimania, a month ago and it aims to be a common place to make Wiktionaries better, share our productions and thoughts about technological developments. We can also organize events together such as conferences and LexiSession, a fancy way to contribute together to the same topic during a short period of time. In August, we suggest focusing on cat! Another main goal is to increase our network with the other wiki projects. So, if this interests you, you are very welcome! We want multilingual discussions as much as possible, but my English is not very natural, so feel free to correct any mistake you see. Also, I am very inexperienced with team management and I have no idea how to make this project more attractive. I think a nice logo would be cool, but suggestions are welcome! It will be cool, so please join! Noé (talk) 01:13, 21 July 2016 (UTC)
- J'ai regardé la page, et j'aime ce que j'y vois. Peut-être je rejoindrai le groupe plus tard, quand j'ai plus de temps et quelque chose à contribuer (ou peut-être je pourrais vous aider avec les traductions de français en anglais, si ça te tente). Mon français est bien mieux que c'état il y a une année, mais j'imagine que j'ai quand même fait des fautes, et je serais très reconnaissant si tu (ou quelqu'un d'autre) les corrigeais. :) Andrew Sheedy (talk) 03:16, 21 July 2016 (UTC)
- Thanks a lot for you help, I'll clean the announcement and I posted on your talk page some comments about your French, but it is better than my English, for sure! There is no implication of being part, it is not a secret society, so feel free to visit once a month or less, that's fine! Noé (talk) 09:37, 21 July 2016 (UTC)
Category for employment
[edit]I have yet to find a category relating to employment, apart from Category:Occupations. Is there one? DonnanZ (talk) 17:04, 21 July 2016 (UTC)
Am I allowed to create a category for this? No objections? DonnanZ (talk) 12:32, 23 July 2016 (UTC)
- You can't assume no answer to mean no objections. I for one have no idea what you're talking about or what would be in this category of yours. —Μετάknowledgediscuss/deeds 12:39, 23 July 2016 (UTC)
- Sure. I can think of a few. Interview perhaps? Renard Migrant (talk) 12:41, 23 July 2016 (UTC)
- Anything relating to employment which doesn't fit in the Occupations category. DonnanZ (talk) 16:23, 23 July 2016 (UTC)
- It's definitely a significant hole in our category structure, but I'm not sure where to put it or what its limits are. Cat:Business is one possible parent. That has a subcategory Cat:Human resources, which is closer but probably not a good fit, since it's restricted to a management perspective. Cat:Occupations is another possible parent, but it's under Cat:People. That doesn't seem a good fit for most things relating to work as an activity.
- As far as limits, I associate the term employment more with matters regarding whether one is employed and how one becomes employed (i.e., hiring and firing), rather than with the aspects of being employed. I wish we could use "work", but that's got so many other meanings it would be hard to keep from accumulating unrelated terms.
- Would you include:
- What we do for a living is so basic a part of modern life that it bleeds into a variety of topics, and covers a lot of ground. Chuck Entz (talk) 18:38, 23 July 2016 (UTC)
- Well, employment relates to people, so maybe it can be a subcategory of that? The category can also include labour relations.
- Yes, probably all the terms you mentioned - I can also think of job interview, golden handshake, trade union and quite a few other terms. DonnanZ (talk) 16:00, 24 July 2016 (UTC)
label → lb
[edit]Wiktionary:Votes/2016-06/label → lb passed. Can someone do the honors and swap {{label}}
by {{lb}}
in all entries, please? --Daniel Carrero (talk) 18:44, 21 July 2016 (UTC)
- Running now. —CodeCat 19:01, 21 July 2016 (UTC)
- Thank you. --Daniel Carrero (talk) 17:53, 22 July 2016 (UTC)
Guys, I don't know where this movement came from, but making all template names obscure cannot be helpful for someone who's learning how to contribute to the project or those who don't contribute often. When I see {{lb}}
my first inclination is to question what the heck it means, and if the first character is an L or a one. I don't automatically think "label" (and frankly when I see "label" I don't automatically think the template necessarily belongs on the definition line either, which is why it was originally called {{context}}
although that may have been too narrow). It's great if you feel using abbreviations is going to save you time, but how the heck is replacing every {{label}}
template call with the equivalent {{lb}}
going to advance the project? I take issue even with the popular {{usex}}
which is now replaced by the even more obscure {{ux}}
when {{example}}
would have sufficed. If you really really need the name to be short, why not something like {{eg}}
which actually means something outside of this little universe? And now there's a proposed vote to replace {{loan}}
with {{bor}}
apparently to save on that one single lonesome character at the expense of legibility. Am I missing something big here, like cross-project support, or is this not some sort of obfuscation contest? Has Wiktionary been taken over by programmers of assembly or some obscure language? Please, come to your sense and quit this nonsense! DAVilla 04:33, 29 July 2016 (UTC)
- Exactly! If anything, the bot should be converting the abbreviations to the full forms! Andrew Sheedy (talk) 04:41, 29 July 2016 (UTC)
- @DAVilla: About Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor — only 68 entries actually use Template:loan, while thousands of entries use
{{bor}}
. I'm more interested in converting{{borrowing}}
and{{borrowed}}
to{{bor}}
, for consistency with{{der}}
and{{inh}}
. - For anyone interested, Wiktionary:Votes/2015-11/term → m; context → label; usex → ux is a rather long vote in which voters discussed why they voted support, oppose or abstain to 3 separate template abbreviation proposals. The 3 passed. There is a trend toward shorter template names, as indicated by that and other template renaming votes that passed.
- Concerning
{{cx}}
,{{lb}}
,{{context}}
,{{label}}
, none of the names is perfect in my opinion. I believe we usually use "context labels" in conversation. It is also a term that WT:EL uses, as per this 2009 vote. I was very happy to support @Metaknowledge's vote to deprecate{{context}}
and{{cx}}
, but it was not because of their names; it was because they required "lang=", while{{label}}
and{{lb}}
didn't. As said in the label → lb vote, template abbreviations often make it easier to read the code: in{{lb|en|medicine}}
, the word "medicine" is more conspicuous than in{{label|en|medicine}}
or{{context|en|medicine}}
. - I know you oppose the template abbreviations, but I actually like your idea of using
{{eg}}
instead of{{ux}}
. I created a redirect from{{eg}}
to{{ux}}
, so now we can use it if we want. I don't intend to create a vote proposing to change{{ux}}
to{{eg}}
in all entries because I wish to avoid making proposals that cancel each other too quickly. We just changed from{{usex}}
to{{ux}}
, and we can't say for sure yet that{{eg}}
is going catch on. - I think I was able to answer all the points you raised. Obviously, you don't have to agree with my opinion about the shortcuts. --Daniel Carrero (talk) 06:02, 29 July 2016 (UTC)
- The obscure abbreviations is one of the things that I abhor in the en.wikt project. From a programmer point of view, abbreviated terms always make a code unreadable for anyone that is not an elite Wiktionary writer. It can be useful for a handful of templates that are used and repeated everywhere (typically
{{l}}
), but that's it. — Dakdada 12:09, 29 July 2016 (UTC) - I really disagree with something like
{{lb}}
being easier to read. Just now I was looking at your comment in wikitext and was thrown off by{{tl}}
until I realized it's a different shorthand for "template". I don't think I'd ever use it. How would I remember that it's "tl" and not, say, "tm" or "tp"? At least for an abbreviation like{{abbr}}
it's a standard in English. If switching to{{eg}}
is something you and I can find agreement on, then I'd really prefer to halt these needless conversions until we have a roadmap and not a general trend set by scattered votes. Shortening names is fine if that's what the community wants, I just really disdain bizarre abbreviations like{{inh}}
whose meaning I frankly had to pause to think about. (I wouldn't have guessed "inherited" if it wasn't grouped with the etymology templates.) In fact, I would rather have one letter abbreviations for the really common templates than something like{{cx}}
. - What I'd like to see done is a comprehensive survey of which templates are most used and a brainstorm of short words like
{{tag}}
to be matched with them. I'm a little ignorant on other language Wiktionaries, but we should look for consistencies that we might want to maintain, although things like{{g|ender}}
and{{t|ranslation}}
I believe we already streamline pretty well. One-letter names must be reserved for uses that are extremely common. If after some discussion we decide that consistently 3-letter abbreviations are easiest to remember for etymology templates, then so be it. I'm not convinced that the 3-letter form is the best home for them, though. I'd rather have 3-letter forms as shortcuts which are expanded to something more legible. There has to be an understanding that what is easiest for the members of the community is not necessarily ideal for growing the community. Certainly I'm not suggesting that a contributor focused on etymology should have to type out{{borrowed}}
every time he or she would want to use that template, but the shortcuts are listed right on the documentation for anyone who uses them that often. Everyone else, unfortunately meaning a lot of people who don't vote, I'm sure would be happier with a succinct, yes, but also recognizable character sequence. DAVilla 05:59, 30 July 2016 (UTC)- This effort seems to be the exact opposite of the course of action that we should be taking.
- To me the logic of keyboard shortcuts like
{{lb}}
is to save keystrokes (time, carpal tunnel, etc) at input. What is subsequently seen in the edit window should probably be something more instructive to new or less frequent editors. Bots can give us the best of both worlds by periodically changing shortcut to long-form template names. - The possible advantages of displaying the shorter form in the edit window seem to be three:
- 1. The stored page is a little smaller and therefore a little faster to download etc. This seems like a small benefit.
- 2. The edit window shows a little more content. This also seems like a small benefit.
- 3. We could avoid some editors experiencing a disconnect between what the enter ("lb") and what eventually appears in the entry ("label"). This would be limited to those "in the know" about the shortcuts. I would think those in the know could survive the disconnect.
- I think we had better be a bit more concerned about the learning curve for new editors if we are to benefit from broader participation and replace veteran contributors who lose interest in Wiktionary.
- IOW, The entire effort to convert to storing and displaying short-form template names rather than long-form template names seems grossly misguided. Why is this so hard to see? DCDuring TALK 12:43, 30 July 2016 (UTC)
- The obscure abbreviations is one of the things that I abhor in the en.wikt project. From a programmer point of view, abbreviated terms always make a code unreadable for anyone that is not an elite Wiktionary writer. It can be useful for a handful of templates that are used and repeated everywhere (typically
- @DAVilla: About Wiktionary:Votes/2016-07/borrowing, borrowed, loan, loanword → bor — only 68 entries actually use Template:loan, while thousands of entries use
- The rational contributor is going to use existing entries as models to follow rather than reading ELE or other such statute or regulation. From our existing entries, the rational contributor will pick things real fast. From the very fact that the template occurs at the beginning of the definition lines and contains such arguments as "slang", "informal" or "geography", the rational contributor will figure things out even if the template gets renamed to "xyz". We should not underestimate the intelligene of the sort of contributors we are seeking. --Dan Polansky (talk) 13:04, 30 July 2016 (UTC)
- Those persistent, motivated contributors are the ones we are already getting. How about getting some broader participation from those with special expertise (language or subject matter), but less persistence and motivation? DCDuring TALK 15:55, 30 July 2016 (UTC)
- Chiming in to add my 2p in support of more descriptive wikicode. All this hyperabbreviation is obstructive and counterproductive. We need fewer barriers to entry, not more.
- And what is the benefit of these obscure and excessively short code conventions? I see no advantage. ‑‑ Eiríkr Útlendi │Tala við mig 20:11, 2 August 2016 (UTC)
- @Eiríkr Útlendi: I propose you post a late oppose at Wiktionary:Votes/2016-06/label → lb. To do this, indent your vote like ":#" and say "late oppose" or the like. Thus, we can have votes closed and yet track broader support-oppose information than that available at the time of closing. --Dan Polansky (talk) 20:22, 2 August 2016 (UTC)
- Thank you for the suggestion, Dan. Done. ‑‑ Eiríkr Útlendi │Tala við mig 20:34, 2 August 2016 (UTC)
- @Eiríkr Útlendi: I propose you post a late oppose at Wiktionary:Votes/2016-06/label → lb. To do this, indent your vote like ":#" and say "late oppose" or the like. Thus, we can have votes closed and yet track broader support-oppose information than that available at the time of closing. --Dan Polansky (talk) 20:22, 2 August 2016 (UTC)
On the distinction of {{ux}}
vs. {{eg}}
: This subject was discussed before, in Wiktionary:Grease pit/2014/February#Template for eg over usex like label over context. That's when people decided to use {{ux}}
and not {{eg}}
. For the record, I did not participate in that discussion and, as I said above, my preference is actually towards {{eg}}
. --Daniel Carrero (talk) 06:29, 7 August 2016 (UTC)
take pity vs. take pity on
[edit]I'm not very familiar with creating English phrasal-verb entries. I created the entry take pity for the expression "take pity on", even though it (almost?) always occurs with "on", on the principle of using the shortest form that still preserves the meaning. then created take pity on as a hard redirect; I know these are frowned on but I'm not sure what better thing to do. Benwing2 (talk) 05:44, 24 July 2016 (UTC)
- Redirects for prepositions like this are ok (aren't they?) and there's also take pity upon which is amply attested. Still I'd move to take pity on and have the other two as alternative forms rather than redirects, with usage notes or context labels or whatever. Renard Migrant (talk) 16:57, 25 July 2016 (UTC)
Vote: Using template l to link to entries
[edit]FYI, I created Wiktionary:Votes/2016-07/Using template l to link to entries.
Let us postpone the vote as much as discussion requires, if needed at all. --Dan Polansky (talk) 06:59, 24 July 2016 (UTC)
Do we want a unified approach to manuscript forms?
[edit]Auec having been kept and noticing that mãdar exists, I wonder if it would be beneficial to treat all Latin script languages the same as just allow all the variants. English ones include vp and giuen. The question is genuine i.e. do we want a unified approach or a per-language approach? NB having some sort of vote or 'consensus' allowing these doesn't mean we have to have bots creating them by the thousand, simply that those we have won't be deleted provided they meet WT:CFI#Attestation. Renard Migrant (talk) 16:45, 25 July 2016 (UTC)
Removing subsection headings from EL § Translations
[edit]I'd like to remove the 3 subsection headings of WT:EL#Translations without a vote. See diff.
- Translation dos and don’ts
- Translating words which are not lemmas in a target language
- Translating words without an exact equivalent in the target language
Rationale:
- I believe it to be just a cosmetic, nonsubstantial change and it does not change any regulations, hence doing it without a vote.
- It seems the qualifier "Translation dos and don’ts" actually applies to the whole "Translations" section. There are rules like "ONLY add translations that you are CERTAIN of." and "NEVER use automatic translation software" that are outside the section "Translation dos and don’ts", but they qualify as translations dos and don'ts regardless.
- "Translating words which are not lemmas in a target language" and "Translating words without an exact equivalent in the target language" are just repeating the beginning of each rule. We could just as well have a dozen sections like "Translations not in the Latin script", "Word-for-word translations of idioms", "Translations of taxonomic names", etc.
- "Translations" is a H4 title; "Translation dos and don’ts" and the other two are H5 titles. In my Firefox, H4 and H5 appear to be the same font size, which does not look too helpful to me.
--Daniel Carrero (talk) 05:45, 26 July 2016 (UTC)
- This is a really minor, nonsubstantial edit and people haven't objected so far so I did it in this diff. Feel free to comment or request the change to be undone. --Daniel Carrero (talk) 22:03, 28 July 2016 (UTC)
I'm pinging people who participated in Wiktionary:Beer parlour/2015/November#Separate, simplified pages for letters, almost one year ago: @Ungoliant MMDCCLXIV, CodeCat, Aryamanarora, -sche, Wikitiki89.
I created Appendix:Letters/English and Appendix:Letters/Portuguese with the format that was discussed there. This includes categories like: Category:Ee. (Category:E is taken)
I hope I did not make any mistake. But I probably forgot to add some other information that could fit in the appendices. Feel free to edit the pages or make suggestions. As discussed before, if appendices like this can work in all languages, (or just the Latin script languages?) I'd like us to consider killing all redundant entries of letters in the main namespace to reduce clutter. (Except Translingual.) --Daniel Carrero (talk) 08:50, 27 July 2016 (UTC)
- I propose moving CAT:E to WT:E, since we have usually used WT: for shortcut links to any namespace. That would free up CAT:E for the letter categories. --WikiTiki89 17:30, 27 July 2016 (UTC)
- It would be easy to free up CAT:E. The shortcut was created in February 2016 and according to Special:WhatLinksHere/Category:E, it is used in only 21 pages.
- However, I have 2 objections:
- I have taken a liking to the name Category:Ee as a letter category. Note that Appendix:Letters/Turkish (currently a redlink) would be a member of Category:İi and Category:Iı, as explained in Dotted and dotless I.
- "we have usually used WT: for shortcut links to any namespace" I don't remember any "WT:" shortcuts to other namespaces. Regardless, we only recently installed the "CAT:" shortcut; until recently, we only had "WT:" and "WS:" as namespace shortcuts. I think it's more natural to use "CAT:" for categories than "WT:".
- --Daniel Carrero (talk) 17:39, 27 July 2016 (UTC)
- Some examples: WT:SD, WT:RT, WT:LOP, WT:ENPRONKEY, WT:H, WT:TAT. --WikiTiki89 17:48, 27 July 2016 (UTC)
- Thank you for the examples. --Daniel Carrero (talk) 17:51, 27 July 2016 (UTC)
- Some examples: WT:SD, WT:RT, WT:LOP, WT:ENPRONKEY, WT:H, WT:TAT. --WikiTiki89 17:48, 27 July 2016 (UTC)
What counts as "offensive"?
[edit]Looking at news media I see a lot more discussion and argument these days about what is "offensive". We use that word as a gloss on some terms, but unlike archaic etc. it's not in the Appendix:Glossary. Is there any way we can usefully define it there? What could our response be in the hypothetical case that a thin-skinned person started to add "offensive" to many words that others feel might not be? Equinox ◑ 15:10, 27 July 2016 (UTC)
- What we can do is tag words as "insult", or "vulgar", because they reflect the intent of the speaker/writer. However, "offensive" is from the point of view of the listener/reader, which is clearly subjective and a matter of personal feeling. So I don't think we should use this word as a tag. — Dakdada 12:35, 28 July 2016 (UTC)
- "Ladies" has been used as an insult, say, to newcomers to military boot camp. A problem is that offensiveness is determined by speaker, object of term, audience, and other context.
- What can we help users with? I suppose we can help objects know whether they've been insulted and speakers avoid accidental offense, but the latter seems more plausible to me as something that might bring someone to a dictionary. If so, then "offensive" is a more useful label than "insult" or "pejorative", especially if qualified (eg, "usually", "sometimes", "historically"). DCDuring TALK 12:56, 28 July 2016 (UTC)
- Having a mention at ladies that it is a word sometimes used to demean men could be useful. It can sort of slide outside the range of dictionary, given that pretty much any way of referring to a man as feminine (like implying he carries a purse) can be used as an insult.--Prosfilaes (talk) 05:33, 29 July 2016 (UTC)
- Ideally, there would be reference to external authority. For example, both cunt and nigger have usage notes documenting studies of precisely how large a portion of the population they offend, and other dictionaries also have strong usage warnings. I also see the label qualified with "sometimes" and "possibly", which could be of use if only a few or minor references attest offensiveness, or the offensiveness could in such cases be in usage notes rather than on the definition-line. - -sche (discuss) 20:05, 28 July 2016 (UTC)
- I find this as quite useful; I copied part of the note from nigger to wanker, because wanker is apparently more offensive to Brits than nigger, and I'm happy I found that out before it slipped out as a Briticism somewhere inappropriate.--Prosfilaes (talk) 05:33, 29 July 2016 (UTC)
Vote: Placing English definitions in def template or similar
[edit]FYI, I created Wiktionary:Votes/2016-07/Placing English definitions in def template or similar.
Let us postpone the vote as much as a discussion requires, if at all. --Dan Polansky (talk) 17:28, 27 July 2016 (UTC)
- Annoying as hell for editors. What's the rationale this time? Equinox ◑ 17:30, 27 July 2016 (UTC)
- I don't know. If I don't create the vote, you'll see it everywhere within a year's time or so. --Dan Polansky (talk) 17:34, 27 July 2016 (UTC)
- Let me correct myself since I have guessed the rationale: Some editors won't be happy until the links in English definitions end in #English and thus have the English section as an explicit target. --Dan Polansky (talk) 17:48, 27 July 2016 (UTC)
- For the record, I herewith submit my objection to using
{{def}}
, and consider the state before the creation of{{def}}
to be the status quo ante. Right now, the template is used in the following mainspace entries: leo, di, agam, agat, aici, againn, ann, leat, linn, springen, La Niña, léi, daoibh, liom, libh, agamsa, ise, dhuit, chugainn, liach, liag, airsean, asatsa, as-san, aistise, astusan, chugat, chuici, chucu. --Dan Polansky (talk) 17:52, 27 July 2016 (UTC)- For the record, it was also used in El Niño. You removed it from that entry. --Daniel Carrero (talk) 17:54, 27 July 2016 (UTC)
- Once again, I'll point out that the vote you created is only about the automated addition of
{{def}}
. If you really want to stop it, you would just created a BP discussion or vote about disallowing the template altogether. --WikiTiki89 17:56, 27 July 2016 (UTC)- I want to know what editors think and I will find out. The wording of the vote will serve fine for the purpose. Automatic and semi-automatic edits are the gravest danger in that regard, as experience plentifully shows.
- @Daniel: Right, I removed it from El Niño using the "no consensus => status quo ante" principle. I removed it from no other entries. I have a baseline above via listing the entries using the template at the point of submitting the objection, I won't waste my energies on removing the template from the other entries. --Dan Polansky (talk) 18:02, 27 July 2016 (UTC)
- Once again, I'll point out that the vote you created is only about the automated addition of
- For the record, it was also used in El Niño. You removed it from that entry. --Daniel Carrero (talk) 17:54, 27 July 2016 (UTC)
- For the record, I herewith submit my objection to using
Pronunciation of obsolete words
[edit]The OED and many other dictionaries do not show pronunciations of obsolete words. We do, in many cases. I'm not sure what exactly we are showing. How the word used to be pronounced? Surely not - and if so, when exactly? But if not that, what is it? How the word might theoretically be pronounced by someone reading an old text? I don't see how we can justify or source such claims. My question then is whether there should be some policy about the use of =Pronunciation= for an entry all of whose senses are obsolete. (I have removed some of these sections in the past and been criticised for removing useful information.) Ƿidsiþ 16:55, 28 July 2016 (UTC)
- I agree. I made the same exact point in a recent discussion, I forget where. --WikiTiki89 17:18, 28 July 2016 (UTC)
- Are there really no credible sources? If not, how do folks manage to reconstruct all those unattested terms and entire unattested proto-languages? DCDuring TALK 17:44, 28 July 2016 (UTC)
- It's not that there are no credible sources, it's that those sources give an old pronunciation, while all our pronunciations for modern languages are modern pronunciations. Modern pronunciations do not exist for obsolete terms. --WikiTiki89 18:15, 28 July 2016 (UTC)
- So find a way to convey that it's not a modern pronunciation. I don't see the issue here, and if there's a source with an old pronunciation it can be referenced. DTLHS (talk) 18:21, 28 July 2016 (UTC)
- What would be wrong with providing sourced obsolete pronunciation, even for words with current pronunciations? After all, we have the obsolete definitions that those terms had and well as the meanings that have persisted from Chaucer and Shakespeare to the present. DCDuring TALK 18:23, 28 July 2016 (UTC)
- Well I guess that wouldn't be a problem, but there wouldn't be very many words we could find a sourced pronunciation for. But it would also require coming up with an IPA scheme for every time period, which would require actually defining specific time periods. And if we were to find a sourced pronunciation for each period in the entire history of a given word, it would be overkill to include all of them. The pronunciations would also be pretty useless, since they cannot be compared with other words that don't have a pronunciation from the same time period. --WikiTiki89 18:30, 28 July 2016 (UTC)
- I don't think pronunciations have to be sourced for every single word, if we can figure it out in other ways. Linguistics has studied the changes in pronunciation of English quite a bit, so we can rely on that too. Shakespeare's English is pretty well known and may be of particular value to users owing to the popularity of Shakespeare's works. We do have to be aware, though, that no such thing as RP or GA existed centuries ago, so pronunciations are always tied to a particular region or regional standard. Shakespeare's English involves as much a place as it does a time. —CodeCat 18:40, 28 July 2016 (UTC)
- If we allow unsourced reconstructions of historical pronunciations, then how do we decide which time period and place for the pronunciation? And what's to stop us from including a pronunciation for every single year and every single place we manage to find information on? --WikiTiki89 18:43, 28 July 2016 (UTC)
- We have Latin and Ancient Greek pronunciations don't we? —CodeCat 19:05, 28 July 2016 (UTC)
- Sanskrit now too! :) —JohnC5 19:36, 28 July 2016 (UTC)
- The Latin ones are pretty much useless. The Greek ones are useful; they break Greek down into a few specific times and places and can be automated. I don't think we can do that for English. --WikiTiki89 19:41, 28 July 2016 (UTC)
- As we have no limits or even schedules for our ambitions in any area of Wiktionary coverage, why should we start introduce them on this matter? It would be possible to come up with some useful limits, perhaps we could try only EME for starters. DCDuring TALK 20:02, 28 July 2016 (UTC)
- If you try to include too much information, Wiktionary would become unusable. If the information is derivable, then people can derive it on their own and we can focus on including only the useful information. --WikiTiki89 20:22, 28 July 2016 (UTC)
- As we have no limits or even schedules for our ambitions in any area of Wiktionary coverage, why should we start introduce them on this matter? It would be possible to come up with some useful limits, perhaps we could try only EME for starters. DCDuring TALK 20:02, 28 July 2016 (UTC)
- The Latin ones are pretty much useless. The Greek ones are useful; they break Greek down into a few specific times and places and can be automated. I don't think we can do that for English. --WikiTiki89 19:41, 28 July 2016 (UTC)
- If we allow unsourced reconstructions of historical pronunciations, then how do we decide which time period and place for the pronunciation? And what's to stop us from including a pronunciation for every single year and every single place we manage to find information on? --WikiTiki89 18:43, 28 July 2016 (UTC)
- I don't think pronunciations have to be sourced for every single word, if we can figure it out in other ways. Linguistics has studied the changes in pronunciation of English quite a bit, so we can rely on that too. Shakespeare's English is pretty well known and may be of particular value to users owing to the popularity of Shakespeare's works. We do have to be aware, though, that no such thing as RP or GA existed centuries ago, so pronunciations are always tied to a particular region or regional standard. Shakespeare's English involves as much a place as it does a time. —CodeCat 18:40, 28 July 2016 (UTC)
- Well I guess that wouldn't be a problem, but there wouldn't be very many words we could find a sourced pronunciation for. But it would also require coming up with an IPA scheme for every time period, which would require actually defining specific time periods. And if we were to find a sourced pronunciation for each period in the entire history of a given word, it would be overkill to include all of them. The pronunciations would also be pretty useless, since they cannot be compared with other words that don't have a pronunciation from the same time period. --WikiTiki89 18:30, 28 July 2016 (UTC)
- It's not that there are no credible sources, it's that those sources give an old pronunciation, while all our pronunciations for modern languages are modern pronunciations. Modern pronunciations do not exist for obsolete terms. --WikiTiki89 18:15, 28 July 2016 (UTC)
- When we discussed this at Wiktionary:Tea room/2016/June#Pronunciation_of_proditor, a user argued that we shouldn't include the pronunciations of older words because we can't know how they were pronounced... and then I provided references which noted how they were pronounced. Furthermore, obsolete words which occur in works that are still read today, especially famous ones that have been continually read, will still be pronounced today. As to the notion of including every year: why hello, slippery slope fallacy! - -sche (discuss) 20:15, 28 July 2016 (UTC)
- If they occur in texts that are still read and understood today, then how can we call them obsolete? As for the slippery slope fallacy, you're right, but my real issue is how do we choose representative times and places? From what I know, for Greek we just go with the times and places that are well documented; however, for English, we have much more information about times and places. How do we choose which ones to include and which ones to exclude? --WikiTiki89 20:22, 28 July 2016 (UTC)
- Also, the slippery slope fallacy is a fallacy for the inclusion of words and phrases only because each word or phrase still needs to be attested. However, if we allow arbitrarily included deduced pronunciations from any time and place that are not individually verified, the slippery slope is real and not a fallacy. --WikiTiki89 20:28, 28 July 2016 (UTC)
- Shakespeare is an obvious candidate. He spoke, presumably, 16th century Midlands English, but since his performances happened in London, they were probably performed in 16th century London English. —CodeCat 20:41, 28 July 2016 (UTC)
- You're already demonstrating a difficulty of choosing a time and place. --WikiTiki89 20:44, 28 July 2016 (UTC)
- I'm not choosing, I'm giving suggestions. —CodeCat 21:09, 28 July 2016 (UTC)
- You're already demonstrating a difficulty of choosing a time and place. --WikiTiki89 20:44, 28 July 2016 (UTC)
- Shakespeare is an obvious candidate. He spoke, presumably, 16th century Midlands English, but since his performances happened in London, they were probably performed in 16th century London English. —CodeCat 20:41, 28 July 2016 (UTC)
- Also, the slippery slope fallacy is a fallacy for the inclusion of words and phrases only because each word or phrase still needs to be attested. However, if we allow arbitrarily included deduced pronunciations from any time and place that are not individually verified, the slippery slope is real and not a fallacy. --WikiTiki89 20:28, 28 July 2016 (UTC)
- But how are we supposed to know what time period the pronunciation is representing? If a word was used from the 14th to the 19th century and is now obsolete, would we be showing a pronunciation from 1680? 1840? 1550? What phonemes would we use? For Latin and Ancient Greek – and indeed for =Middle English= and =Old English= entries – we have conventions for these things, but in modern English sections we are assumed to be showing a current pronunciation, which makes no sense for an obsolete word. At the very least there should be some time label attached to them, but even then I don't know where we would get the information from. Ƿidsiþ 06:31, 29 July 2016 (UTC)
- Old texts get read in modern pronunciation, and obsolete words do turn up in them. The current pronunciation is probably what English professors use for it.--Prosfilaes (talk) 01:29, 30 July 2016 (UTC)
- English professors have to guess the pronunciations and likely don't all use the same pronunciation. This pronunciation was not orally transmitted since the time the word was in use (and if it was orally transmitted, then I don't think we can call the word obsolete). --WikiTiki89 14:34, 1 August 2016 (UTC)
- Old texts get read in modern pronunciation, and obsolete words do turn up in them. The current pronunciation is probably what English professors use for it.--Prosfilaes (talk) 01:29, 30 July 2016 (UTC)
- If they occur in texts that are still read and understood today, then how can we call them obsolete? As for the slippery slope fallacy, you're right, but my real issue is how do we choose representative times and places? From what I know, for Greek we just go with the times and places that are well documented; however, for English, we have much more information about times and places. How do we choose which ones to include and which ones to exclude? --WikiTiki89 20:22, 28 July 2016 (UTC)
- Comment I would find Shakespearean pronunciations extremely helpful, as I try to read Shakespeare and older poetry in the original pronunciation (or close) so as not to miss out on rhymes and wordplay. Right now, I have to look at groups of words, sharing certain characteristics of pronunciation, based on RP, and try to extrapolate for words not found on those lists.
- It would be equally useful to have pronunciation on 18th-19th century British English. We could always put a little label beside such pronunciations that makes it clear that they are reconstructed and/or obsolete, and maybe put them in collapsible boxes so they wouldn't show up for all users and confuse them. While accuracy is important, I think it's valuable to include information that isn't fully proven, but is most likely true. Andrew Sheedy (talk) 01:37, 30 July 2016 (UTC)
- When giving pronunciations for modern words, we're painting rough strokes of an exemplaric form as well. We could just as easily conjecture a Shakespearean example as one for "RP" (aka. 1950s southern midlands upper class England). For dead languages, we pick writing-heavy periods and give the user an idealised average of pronunciation. They don't do anything different in universities and theatres. Korn [kʰũːɘ̃n] (talk) 17:46, 30 July 2016 (UTC)
- If you think about it, our current practice is misleading: Shakespeare has wordplay based on stale and steal being homophones, but our entries for those words only show modern pronunciations that don't reflect this. By only showing modern pronunciations, we're strongly implying that English pronunciation hasn't changed since 1500. We need to figure out how to represent phonetic etymology in a way that's both concise and honest. Sure, mark reconstructed pronunciations, but don't exclude them.
- As for the infinite multiplication of pronunciations: we could have already done that with modern English, given all the thousands of well-documented local pronunciations, but we haven't. I doubt we would do much worse if we added historical variants to the mix. Chuck Entz (talk) 19:28, 30 July 2016 (UTC)
Vote: Request categories
[edit]See Wiktionary:Votes/2016-07/Request categories.
I created it based on this RFM discussion.
The RFM discussion was initially about renaming only "Category:Translation requests (X)". Then I proposed renaming all the request categories. This is a major proposal that affects many categories. In the RFM discussion, @Dan Polansky suggested doing this via a vote. I support doing so via a vote.
Because this is a major proposal, I scheduled the vote to start in 2 weeks. Once is starts, it is scheduled to last for 2 months. Feel free to make changes or suggestions/comments. --Daniel Carrero (talk) 01:51, 29 July 2016 (UTC)
Sanskrit vs. Old Indo-Aryan
[edit]I notice that we currently don't list Sanskrit in the modules as ancestor for anything. Strictly speaking, this is correct, but it's a very widespread practice in most dictionaries to treat Sanskrit as a stand-in for Old Indo-Aryan in etymologies. It's also, as far as I can tell, the most common practice in our own etymologies.
If we're going to convert the combination of {{etyl|sa|xx}} {{m|xx|...
to use {{inh}}
in all of these etymologies, shouldn't we make Sanskrit the ancestor of all the prakrits and of all the other Indo-Aryan languages currently shown as descended directly from Proto-Indo-Aryan? Chuck Entz (talk) 18:58, 30 July 2016 (UTC)
{{etyl|sa|xx}} {{m|xx|...
is equivalent to{{der}}
, not{{inh}}
. Converting to{{der}}
is always ok, nothing changes. Only{{inh}}
demands an ancestor relationship. —CodeCat 19:07, 30 July 2016 (UTC)- I've been curious about this issue for a little while now. Classical Sanskrit has no direct descendants, and someone on here was claiming that all the Prakrits descend from Rigvedic Sanskrit. I've had a good deal of trouble finding sources for this claim. If they do indeed descend from Rigvedic, I'm not sure if there is enough difference between Rigvedic and Classical to give them separate codes. —JohnC5 19:41, 30 July 2016 (UTC)
- Classical Sanskrit had diversified a little bit already. "Krishna", "Sanskrit" and "Rigveda" are specifically northern forms. The original pronunciation common to all dialects had a syllabic r instead of ri, other dialects inserted different vowels: ru in the west, ra in the east. —CodeCat 20:02, 30 July 2016 (UTC)
- But those are all spelled as syllabic r in Sanskrit. Did Pāṇini describe the pronunciation as ri? Chuck Entz (talk) 21:11, 30 July 2016 (UTC)
- There are nationalistic and religious reasons to exaggerate the age and universality of Vedic Sanskrit. I hope those aren't in play here. Chuck Entz (talk) 21:22, 30 July 2016 (UTC)
- If this discussion may be believed, then Vedic cannot represent the ancestor of all the Indo-Aryan languages. If there are Prakrits directly descended from Vedic, I don't know which ones, but in the mean time, I think we just have to stick with Proto-Indo-Aryan. —JohnC5 21:47, 30 July 2016 (UTC)
- Classical Sanskrit had diversified a little bit already. "Krishna", "Sanskrit" and "Rigveda" are specifically northern forms. The original pronunciation common to all dialects had a syllabic r instead of ri, other dialects inserted different vowels: ru in the west, ra in the east. —CodeCat 20:02, 30 July 2016 (UTC)
- That fixes the technical problem, but the contradiction remains. Chuck Entz (talk) 22:24, 30 July 2016 (UTC)
- I've been curious about this issue for a little while now. Classical Sanskrit has no direct descendants, and someone on here was claiming that all the Prakrits descend from Rigvedic Sanskrit. I've had a good deal of trouble finding sources for this claim. If they do indeed descend from Rigvedic, I'm not sure if there is enough difference between Rigvedic and Classical to give them separate codes. —JohnC5 19:41, 30 July 2016 (UTC)
- AFAIK, Sauraseni Prakrit is a direct descendant of Rigvedic Sanskrit. Alfred C. Woolner's "Introduction to Prakrit" may help; it's available on archive.org. —Aryamanarora (मुझसे बात करो) 20:21, 1 August 2016 (UTC)
- @Aryamanarora: So on pages 3-4 of Woolner, he says:
“If in "Sanskrit" we include the Vedic language and all dialects of the Old Indian period, then it is true to say that all the Prākrits are derived from Sanskrit. If on the other hand "Sanskrit" is used more strictly of the Pāṇini-Patañjali language or "Classical Sanskrit," then it is untrue to say that any Prakrit is derived from Sanskrit, except that Śauraseni, the Midland Prākrit, is derived from the Old Indian dialect of the Madhyadeśa on which Classical Sanskrit was mainly based.”
The phrase “and all dialects of the Old Indian period” worries me if we want to include all the Prakrits. We should certainly move Śauraseni under Sanskrit if we decide to keep Vedic together with Classical (which I also support). —JohnC5 23:16, 1 August 2016 (UTC)- And what about the apabhraṃśas? Do these deserve language codes or should they fall under their respective Prakrits? And what about re-Sanskritized lnaguages, such as Marathi? According to Wikipedia, "The contemporary grammatical rules described by Maharashtra Sahitya Parishad and endorsed by the Government of Maharashtra are supposed to take precedence in standard written Marathi. Traditions of Marathi Linguistics and the above-mentioned rules give special status to tatsamas, words adapted from Sanskrit" (no citation however). —Aryamanarora (मुझसे बात करो) 15:37, 2 August 2016 (UTC)
- @Aryamanarora: So on pages 3-4 of Woolner, he says:
Vote: CFI - letting terms be linked to pertinent sections
[edit]FYI, I created Wiktionary:Votes/2016-07/CFI - letting terms be linked to pertinent sections.
Let us postpone the vote as much as the discussion requires, if at all. --Dan Polansky (talk) 11:45, 31 July 2016 (UTC)