Wiktionary:Beer parlour: difference between revisions

From Wiktionary, the free dictionary
Latest comment: 15 years ago by Bequw in topic ISO 639-5
Jump to navigation Jump to search
Content deleted Content added
Conrad.Irwin (talk | contribs)
Bequw (talk | contribs)
→‎ISO 639-5: can now be used with {etyl}
Line 1,117: Line 1,117:
[[w:ISO 639-5|ISO 639-5]] was released May, 2008 and it assigns 3-letter codes to language families (eg "Turkic languages" is '''trk'''). There are currently around 114 codes and they use the same "pool" as ISO 639-2 & 3 codes. 639-5 is disjoint from 639-3 (the standard for individual languages) but is a superset of the "collective" codes from (which codes both "collective" and individual languages). I think Wiktionary should employ ISO 639-5 codes for specific purposes here, such as in standardizing etymologies. These codes would allow us to standardize many of the entries in [[Wiktionary:Languages without ISO codes]] (see [[Wiktionary_talk:Languages_without_ISO_codes#ISO_639-5|how]]). As these codes aren't valid for L2 entries we should prefix them. That way they could be restricted from being subst'd or used with {{temp|infl}}, {{temp|term}}. Certain templates, such as {{temp|etyl}} and possibly {{temp|proto}}, could be coded to look for language family codes at the specific prefix. Atelaes suggested '''macro:''' as a prefix, though that could be confusing as certain codes in ISO 639-3 are termed ''''macro'''languages'. The title of 639-5 is ''Alpha-3 code for language families and group'' so maybe '''group:''' or '''family:''' but maybe someone has a better idea. As there are some existing ISO 639-2 "collective" codes (and therefore 639-5 codes) currently in use they would be prefixed as well. Thoughts on this plan? --[[User:Bequw|Bequw]] → [[Special:Contributions/Bequw|¢]] • [[User talk:Bequw|τ]] 06:44, 30 January 2009 (UTC)
[[w:ISO 639-5|ISO 639-5]] was released May, 2008 and it assigns 3-letter codes to language families (eg "Turkic languages" is '''trk'''). There are currently around 114 codes and they use the same "pool" as ISO 639-2 & 3 codes. 639-5 is disjoint from 639-3 (the standard for individual languages) but is a superset of the "collective" codes from (which codes both "collective" and individual languages). I think Wiktionary should employ ISO 639-5 codes for specific purposes here, such as in standardizing etymologies. These codes would allow us to standardize many of the entries in [[Wiktionary:Languages without ISO codes]] (see [[Wiktionary_talk:Languages_without_ISO_codes#ISO_639-5|how]]). As these codes aren't valid for L2 entries we should prefix them. That way they could be restricted from being subst'd or used with {{temp|infl}}, {{temp|term}}. Certain templates, such as {{temp|etyl}} and possibly {{temp|proto}}, could be coded to look for language family codes at the specific prefix. Atelaes suggested '''macro:''' as a prefix, though that could be confusing as certain codes in ISO 639-3 are termed ''''macro'''languages'. The title of 639-5 is ''Alpha-3 code for language families and group'' so maybe '''group:''' or '''family:''' but maybe someone has a better idea. As there are some existing ISO 639-2 "collective" codes (and therefore 639-5 codes) currently in use they would be prefixed as well. Thoughts on this plan? --[[User:Bequw|Bequw]] → [[Special:Contributions/Bequw|¢]] • [[User talk:Bequw|τ]] 06:44, 30 January 2009 (UTC)
: So far some of the -3 family codes (e.g. {{temp|sla}}, {{temp|bat}}, {{temp|dra}} etc.) are used with {etyl}, so they should first be relocated to '''fam:xxx''' (or whatever the prefix be) before this gains official blessing. I like this idea of usage of secondary namespace for families, as it contains more direct metadata providing the separation between individual and groupings of languages, and would prob. simplify maintenance. I'm not sure how this is supposed to work with {proto} as that templates takes explicitly name of the family as the first positional parameter (unless it gets rewritten to support both e.g. <tt><nowiki>{{proto|Indo-European|...}}</nowiki></tt> and <tt><nowiki>{{proto|ine|...}}</nowiki></tt> types of invocations, much like some of the templates now are accepting both ISO code and full language name). --[[User:Ivan Štambuk|Ivan Štambuk]] 23:32, 31 January 2009 (UTC)
: So far some of the -3 family codes (e.g. {{temp|sla}}, {{temp|bat}}, {{temp|dra}} etc.) are used with {etyl}, so they should first be relocated to '''fam:xxx''' (or whatever the prefix be) before this gains official blessing. I like this idea of usage of secondary namespace for families, as it contains more direct metadata providing the separation between individual and groupings of languages, and would prob. simplify maintenance. I'm not sure how this is supposed to work with {proto} as that templates takes explicitly name of the family as the first positional parameter (unless it gets rewritten to support both e.g. <tt><nowiki>{{proto|Indo-European|...}}</nowiki></tt> and <tt><nowiki>{{proto|ine|...}}</nowiki></tt> types of invocations, much like some of the templates now are accepting both ISO code and full language name). --[[User:Ivan Štambuk|Ivan Štambuk]] 23:32, 31 January 2009 (UTC)

:{{temp|etyl}} can now be passed language codes that exist in the '''etyl:''' prefix ([fam] is an ISO code so also wouldn't have made a good prefix). Right now we just have ISO 639-5 codes there (see [[:Category:ISO 639-5 templates|cat]]). Note for those wanting to create these language code templates: as these templates have limited use (they aren't <nowiki>{{subst:}}-ablee</nowiki>) they have a different, more useful, format than the normal language codes. --[[User:Bequw|Bequw]] → [[Special:Contributions/Bequw|¢]] • [[User talk:Bequw|τ]] 19:50, 14 February 2009 (UTC)


== Hungarian form of template - new approach ==
== Hungarian form of template - new approach ==

Revision as of 19:50, 14 February 2009

Wiktionary:Beer parlour/header





November 2008

Etymology sections are very concise

I think we really could use some more wording in etymology sections. Cryptic stuff like ‘short + cut’ really isn’t very helpful. I also do not like the use of ‘<’ (or was it ‘>’?) to indicate inheritance and so on. What do other people think of this and should we work out a consensus here?

The impetus is {{suffix}} and related templates which indirectly promote this terseness. If including some more wording in these templates is not wanted, then at least we whould update their usage to instruct people to put some verbiage around it, but I fear that will make the templates less usable. H. (talk) 16:23, 3 November 2008 (UTC)Reply

I find the terseness to make the etymologies easier to read. The use of + and < allows the etyma to stand out more clearly. Quite frankly, what more do you want to put down than "short + cut"? Seems to me like unnecessary fluff. If you can show me a wordier ety that I like, I might change my mind. -Atelaes λάλει ἐμοί 18:25, 3 November 2008 (UTC)Reply
Brevity is only good if it is unambiguous and gives all the necessary information. As we don't have that much information to give, these abbreviated forms work - I'm sure examples can be found where too much information has been packed too tightly. However, if we persist in having the Etymology bit before the useful</troll> parts of the entry, then they will be kept brief. Conrad.Irwin 18:38, 3 November 2008 (UTC)Reply
I always put "From", e.g. From {{term|short}} + {{term|cut}}., but when another editor removes it, I don't revert. I do think the "From" is important, because not all of our readers know what "etymology" means. Similarly, I don't use <, because I don't think the casual reader will recognize it, and while in some cases I think the idea comes across anyway, in some cases I think it does not. (I view it as analogous to the various abbreviations, F. and so on, that are found in other dictionaries but that we don't use.) However, I'm fine with +, as it seems crystal clear to me. —RuakhTALK 20:05, 3 November 2008 (UTC)Reply
"not all of our readers know what "etymology" means" Which editors do you mean? Etymology is a loanword from Greek present in almost every Indo-European language (and other languages who are not so reluctant to accept borrowings - エチモロジー ), therefore virtually all editors from Europe, South, Central and North America must know what etymology is, must not they? Bogorm 20:39, 3 November 2008 (UTC)Reply
Just because a person is a native speaker of a language does not mean they know every word in that language, and certainly not that they understand every concept described by that language. To know what "etymology" means requires at least a rudimentary understanding of language evolution, etc. Those of us who are interested in a discipline should not take for granted such knowledge, as plain as it may seem to us. Up until a few days ago, I had no idea what the term "liquidity" meant (and to say that I have an exceptionally solid grasp of the term now would be deceptive), as I have almost no background in economics. I would not consider myself stupid nor generally uneducated (though others are free to disagree :)). -Atelaes λάλει ἐμοί 23:06, 3 November 2008 (UTC)Reply
Starting with “From” is good form. It may help a new reader who has never heard the term etymology understand what he is looking at, on his very first page view of Wiktionary. It in no way detracts. Michael Z. 2008-11-04 17:09 z
I agree that starting with "From" is good form. --EncycloPetey 17:15, 4 November 2008 (UTC)Reply
Etymologies for compound words don't warrant much more than we have, unless we decide to include dates for first attested usage of one or more senses. I wonder if we might not expose ourselves to an endless supply of folk etymologies if we make wordy etymologies, especially for compounds. Long, discursive, or disputed etymologies should certainly not consume too much space, especially if they force definitions off the first screen. Such etymologies and two or more lines of cognates especially should normally only appear under a show-hide bar, if cognates be retained at all. DCDuring TALK 20:48, 3 November 2008 (UTC)Reply
I am very adamant that cognates remain, when appropriate. While I share your concern about ten page theses blocking out the heart of the entry, I don't think it prudent to trim the preceding content to a minimum. The answer lies, rather, in altering our formatting/presentation. -Atelaes λάλει ἐμοί 23:06, 3 November 2008 (UTC)Reply
What about using show/hide bars for etymological material such as cognate lists or discussions of disputed etymologies that, in total, take more than 3 lines and push definitions off the initial screen (with right-hand Toc)? As you know I like etymologies, including long chains through Middle and Old English and French; Anglo-Norman; Vulgar, Medieval, Late, and New Latin, the loss of visibility of which would greatly sadden me. DCDuring TALK 23:26, 3 November 2008 (UTC)Reply
Since I like too etymological chains through Old Norse, Gothic and Sanskrit, it would sadden me as well. (That was facetious, I support every etymological information) I do not embrace the proposal for the hide bars, since I am firmly convinced that etymology is one of the most important parts of the articles, and however disputed it is, expounding the diverse linguistic theories without concealing any of them is indispensable for a thorough comprehension of the entry´s meaning. Bogorm 16:36, 4 November 2008 (UTC)Reply
If we were just running this for our own benefit, I could agree. I'm looking for ways to make this site more useful for ordinary (unregistered, non-contributing, non-linguist) users, by getting onto the initial sreen more of the info that is, I think, most commonly sought: 1. definitions and 2. a guide to the definitions and other material that don't fit on the first screen (the Table of contents). Registered users ought to be given the power to have show/hide bars in the sections that they select be expanded by default (a feasible option, BTW). DCDuring TALK 20:08, 4 November 2008 (UTC)Reply
Ever since we introduced Show/Hide bars, my experience has been that average users aren't aware of their function, and overlook their contents entirely. Time and again, I have seen comments made from ordinary users who were surprised when they finally discovered them, and that's just the fraction who discover them. I've been following the addition of Translations to WOTD entries since before the introduction of Show/Hide bars. When these were introduced to the Tranlsations sections, addition of translations by average users plummeted, and this drop has never recovered to its former levels. In short, your position that these tables benefit average users isn't supportable. If we do choose to use them, them they should be expanded by default, and collapsible only as a customization feature available to registered users. --EncycloPetey 20:17, 4 November 2008 (UTC)Reply
On the use of "<" and conciseness: I like the use of "<" to mean "from". Century 1911 uses "<" and "+" in its etymology markup. Unlike Wiktionary, the etymology in Century 1911 is not introduced by "Etymology" heading, and is placed in "[" and "]" instead; and yet the readers of Century 1911 must have managed to learn to read its entries. A new reader of Wiktionary sees a content under an "Etymology" heading, so can quickly look up the word "etymology" in Wiktionary to find out what it means. --Dan Polansky 21:09, 6 November 2008 (UTC)Reply
Headers (of all kinds) take up more space on enwikt than on some other wiktionaries, and other online dictionaries, let alone print dictionaries. The space taken by "from" is almost negligible by comparison. DCDuring TALK 22:20, 6 November 2008 (UTC)Reply
It's not the space taken by "from" for which I prefer "<". I prefer it for its faster showing me the derivation chain: my eye locates the individual elements separated by "<" faster than when they are separated using "from". I understand that one of the reasons why printed dictionaries use terse markup are the space constraints, which are absent in an electronic dictionary such as Wiktionary.
The issue seems to me to be at least remotely similar to the mathematician's preference of symbols to wordy sentences. Mathematical formulas can be phrased in the words of natural language, but when it is done, patterns and structures are obscured. --Dan Polansky 12:38, 7 November 2008 (UTC)Reply
Good observation. Let me stress that I also do not like page-long etymologies, since most of the time, they contain garbage (that is just an observation, there might be occasional words where the information deserves its place). I think, however, that, probably out of fear for that, we have shot through to the other extreme of being so terse it is barely understandable without exercise. That was my initial concern. I plead for a compromise. For example, for compound words, I would say “Compound of X + Y” is a good compromise: novices will hopefully understand this, or at least have the possibility to look up compound (whether to link it by default or not can be argued for later, it is done for e.g. blend (is that in {{portmanteau}}? Yip.). As for >: I think it is unclear because I never understand which way it is pointing to or what it is supposed to mean. (I see you talk about < above, which just illustrates my point: if even a regular contributor doesn’t get the meaning, a novice will only be confused. I therefore replace the < wherever I see it.)
So please, let’s focus on finding a good compromise. H. (talk) 16:18, 13 December 2008 (UTC)Reply
I find replacing "<" with "from" to be problematic, because in a full-length etymology it leads to a long series of "from ... from ... from ..." statements, which I actually find more disorienting and harder to read than the arrow notation. I also don't see how "<" could be interpreted as pointing in more than one direction, but I agree it's not very satisfactory. When I'm really feeling verbose, I break the etymology down into normal human sentences, something like: "From Middle English N, derived from Old French O. This in turn was derived from the Ancient Greek Q via the Latin P." But I'm not sure that approach really makes the historical sequence any easier to understand.
Myself, I don't really see a problem with terse etymologies. I don't mind verbose etymologies either, as long as they are accurate and stay on-topic. Still, for many words, a simple {{prefix}} or {{compound}} is a fairly complete etymology in itself, and comparable to what other dictionaries provide. If someone wants to expand that template into a complete sentence, or add supplemental information, so much the better. But a minimal etymology is vastly better than none, and helps to lay the groundwork for a more complete treatment in the future. -- Visviva 16:56, 13 December 2008 (UTC)Reply
But note that the viewer of the page does not see the difference between {{suffix}} and {{compound}} and the like: both simply produce a plus, the only difference is a minimalistic - at the relevant place. I think that’s unfortunate, we could at least mention that it is a compound or a word with a suffix, as is e.g. done in {{blend}}. I propose to at least change the aforementioned templates to a wording like the latter. Unfortunately, that will probably break a lot of entries. H. (talk) 14:38, 4 January 2009 (UTC)Reply
I can see your point... I don't think this is worth breaking thousands of entries over, but it would be nice to have a verbose option, or a set of verbose variants, for {{suffix}} et al. Simply forcing verbose behavior would be bad IMO, because there are times when the sentence needs to be formatted in some non-standard way. I am tempted to suggest that we just add an option like "verbose=yes", but as Robert recently noted, that is not usually the best approach... My brain is not working particularly well today, but there has to be some elegant solution that will allow verbosity without breaking any current uses. Maybe a specific control character that could be inserted in any position, something like {{compound|green|house|.}}. Shall we take this to WT:GP? -- Visviva 09:02, 8 January 2009 (UTC)Reply


Seeking final comment on Hangul syllable entries

OK, I am now the proud owner of a 2.3-megabyte text file containing basic entries for all Unicode Hangul syllables. For examples of the output, see and . Once created, I do not intend to edit these entries again, ever (excepting the handful that are real words), and I would sincerely hope that no one else has to edit them either. With that in mind, are there any final thoughts about the layout of these entries? -- Visviva 04:44, 19 November 2008 (UTC) P.S. If one of our resident wizards could find a way to make Template:ko-symbol-nav a bit less squirrely, that would be wonderful; however, since it's templated it's not urgent.Reply

Looks good. I like that all the elements are in flexible templates. Quibbling details:
Can we link to Revised and Yale transliteration info? (for applications like these, it would be nice to have unobtrusive links, like the context labels in pl.wiktionary—cf. “geogr.” in pl:Korea).
ko-symbol-nav seems cluttered by all of the hyphen separators. Dots would be less obtrusive if you insist on character separators, but I think the table arrangement and spacing is sufficient. I would also link both the arrow and character for previous and next. I'd be glad to rework the template.
Wording: does ko-usage-keystroke need the word standard?—(to differentiate it from a common non-standard dubeolsik keyboard?) Can we link dubeolsik keyboard to an explanation or Wikipedia? ko-usage-unicode: Unicode standard notation is U+AD6B, with no need to explain that it's hex. You could reduce the wordiness to “Unicode representation U+AD6B.” Michael Z. 2008-11-24 17:36 z
Thanks for this. I think I have implemented all of your suggestions above (except for the "unobtrusive links" part; I'm not sure of the current state of consensus on that). Please feel free to edit the templates further if you are so inclined. -- Visviva 03:11, 25 November 2008 (UTC)Reply
Neat! Let's do it! bd2412 T 04:52, 25 November 2008 (UTC)Reply
The only thing that bothers me is that the "Usage notes" aren't actually notes about usage. Rather, they're notes about typing and encoding. Does anyone know of a better header for this section? --EncycloPetey 03:07, 28 November 2008 (UTC)Reply
I don't see the problem there (taking "usage" with a very broad definition). Maybe just "notes" would avoid any such problem, but I don't think readers will be confused or mislead by the header as it is. bd2412 T 05:50, 28 November 2008 (UTC)Reply
Good point. I would have just called them "Notes" if that weren't proscribed. How about "Technical notes"? That might come in handy for many of our Translingual entries as well. -- Visviva 05:53, 28 November 2008 (UTC)Reply
It is of a technical nature. Whatever answer is chosen, the same method should be used on the Chinese/Han/CJK(V) entries and the Korean syllable entries, and from there to any entries about letters or symbols or characters which include technical nature.
An example of an actual usage note for Korean syllables might be to note those which don't actually occur in Korean writing. If I'm not mistaken I believe I've heard or read that Unicode includes Korean syllables which are technically possible but linguistically impossible. Is this correct? — hippietrail 06:40, 28 November 2008 (UTC)Reply
It's a bit difficult to prove a negative. Syllables that don't exist in standard Korean may still turn up in eye dialect and internet 외계어 ("Martian", the language of the PC-bang generation). There are also some syllabic blocks that can never represent a syllable, but which are nonetheless common in written Korean (in fact that would apply to any syllable with an aspirated or compound batchim, such as 읊 in 읊다 or 없 in 없다). If someone wants to compile this information, it would do no harm, but I ain't volunteering. :-) -- Visviva 07:40, 28 November 2008 (UTC)Reply
As a comparison, here is a typical Han character entry:
(radical 187 馬+2, 12 strokes, cangjie input 戈一尸手火 (IMSQF) four-corner 31127)
References
KangXi: page 1433, character 11
Dai Kanwa Jiten: character 44579
Dae Jaweon: page 1958, character 5
Hanyu Da Zidian: volume 7, page 4540, character 3
Unihan data for U+99AE
A lot of the technical stuff is in the "inflection line", skipping definitions, which syllables don't have, we then have a "References" heading which tells us where to find this character in several well known character dictionaries, followed by a link to the Unicode site which is where the Unicode codepoint is given. — hippietrail 06:49, 28 November 2008 (UTC)Reply
I don't think the situations are really comparable. The CJK characters are real units of meaning, with real entries in real dictionaries; Hangul syllabic blocks, for the most part, have no independent meaning, and no existence outside of the realm of digital possibility. (This is why I tried to have them deleted, but failing in that I figure the next-best thing is to create a complete set of consistent entries for them.)
But yes, I could see putting the keyboard input and composition in the inflection line -- though I have to say that seems a bit odd, even for the CJK entries -- and putting the Unicode data under "References". Anyone else have thoughts on this? -- Visviva 07:40, 28 November 2008 (UTC)Reply
Yes! I think the Korean entries above are far superior to this Han entry. For me, the Han entry is just incomprehensible babbling, even though I already know a little bit about it and have some ideas what some of it might mean. For a laymen, this is totally unusable. With the Korean entries, this is not the case: it is clearly explained what is what. (Ok, discussion about that is possible, but at lease more clearly.) So please keep the current format. I hope the people that create these Han entries like that will not be insulted, it’s just that I think a lot could be done to make them more helpful, like link to the relevant Wikipedia (or even an internal) page for all of the terminology (‘radical’, ‘stroke’, ‘cangjie’, ‘w:IMSQF’, ‘w:four-corner’ and all of the dictionaries). H. (talk) 16:45, 13 December 2008 (UTC)Reply
One remark would be that the navigation templates are very concise. A simple caption telling what they are for or what they do would be great. H. (talk) 16:45, 13 December 2008 (UTC)Reply
How about something like {{ko-symbol-nav-ga}}? This would require some more work, since the template would need to be hand-coded to a great extent, but it would provide some of the additional information people seem to want. -- Visviva 18:09, 13 December 2008 (UTC)Reply
I do not really understand where you’re getting at, but I would just add some more explanation to the navigation templates: words like ‘next’, ‘previous’, probably explaining what kind of next (i.e. in what sense) etc. Unfortunately, I do not understand Korean well enough to do that myself. I think instead of 괴 ← I’d like to see Next symbol in Unicode sequence: 괴; or whatever relevant. A link explaining would do as well. Just think about usability for someone who has no idea of the concepts Unicode and Hangul. H. (talk) 18:14, 4 January 2009 (UTC)Reply

December

濠洲, images of words, copyright implications

Yesterday while browsing through used bookshops I found an old Japanese atlas which contained a map of Australia using the old ateji spelling 濠洲. Thinking fair use I took a photo. Finding the image upload process here trick I went to the IRC channel #wikimedia-commons to ask help and provide feedback. I don't know much Japanese so couldn't find publisher or copyright info on the atlas. It was only a few pages thick and quite old. My best guess is that it was intended for students and dates from between about WWII and the 1960s.

Now the Commons guys think if I include the whole photo it's probably a derivative work and I need to establish that it's not copyright. My photo is just the portion of the map immediately surrounding the recognizable shape of Australia for context.

I've included the full map and two cropped versions provided progressively less context but with progressively fewer potential copyright problems.

Does anybody have any thoughts. Photographic citations of words in context will be a topic for us sooner or later. — hippietrail 03:39, 17 December 2008 (UTC)Reply

In principle, I don't think there's any difference between a photographic citation and a textual citation; both should be fine for us to use. However, if there's enough context to be meaningful, I expect that has to be considered fair use, which means it can't be hosted on Commons, which means we would have to enable image uploading here. I gotta say, I'm not in a great rush to open those floodgates. -- Visviva 04:20, 17 December 2008 (UTC)Reply
Not sure why we need the photo as a citation. Someone who knows Japanese can identify the year of publication, title, publisher, editor, whatever, and create a regular citation for the term. That might not be feasible in this particular case (try getting someone who knows Japanese to go to that bookstore with you!), but I don't think photo citations are something we'll need much. (In fact, having the photo without the bibliographic info doesn't help much as a cite anyway.)—msh210 17:21, 17 December 2008 (UTC)Reply
Indeed, it would be better to photograph the details of the work (publisher, title, authors/editors, publication year, etc) and have it translated by someone that speaks Japanese. I don't think Fair Use covers situations like this. EVula // talk // 16:49, 18 December 2008 (UTC)Reply
Fair use is not in fact sufficient reason to use something in a MediaWiki project. Photographing the publishing details would be as much of a copyright infringement as any other page. Identifying where the publishing details are would itself require someone that reads Japanese, unless you can tell me where this info usually is in old Japanese atlases (-: — hippietrail 03:52, 19 December 2008 (UTC)Reply
Re: "Photographing the publishing details would be as much of a copyright infringement as any other page.": Is that true? I thought copyright was about the expression of ideas; the presentation of factual information in a standard format cannot be copyrighted. (At least, that's how it is in the U.S. We used to have a "sweat of the brow" doctrine, but there was some sort of case involving a phone book where the Supreme Court ruled that there was nothing creative in the phone book's format, and the information in it was not copyrightable. I suppose other countries might have different rules.) Or are you saying that in old Japanese atlases, the publishing details may have been presented in a creative, copyrightable format? —RuakhTALK 19:39, 20 December 2008 (UTC)Reply
To be clear, photographing a copyrighted work is never copyright infringement (despite what some institutions will have you think). But publishing or distributing that photograph may be.
However, temporarily putting a photo on some website so that a Japanese reader can determine whether it is copyrighted sounds like scholarship to me. In my inexpert opinion, this use would be protected by fair dealing or fair use (depending on the laws of your country). Michael Z. 2009-01-08 02:48 z
I support fully the uploading and would like also to remind you that the Japanese copyright is not 70 years, but 50 years after the publishing. Therefore all stuff from WW2 and issued before 1959 is copyright free. Bogorm 12:12, 26 December 2008 (UTC)Reply
I returned to the shop armed with notes on the characters for Japanese era names so I could work out dates. All the dates inside the covers and on the front were between 1946 and 1950. So this atlas appears to now be in the public domain. — hippietrail 01:54, 8 January 2009 (UTC)Reply

hundred and thousand

(Moved from Talk:hundred#classification)

I know it is a tradition to classify hundred as a cardinal number and dozen as a noun, but on what ground is it justified? If you examine them grammatically, you'll find they are alike, while twenty through ninety are true numerals.

ten men / *a ten men / ?tens of men / *a few ten men
twenty men / *a twenty men / *twenties of men / *a few twenty men
*dozen men / a dozen men / dozens of men / a few dozen men
*score men / a score men / scores of men / a few score men
*hundred men / a hundred men / hundreds of men / a few hundred men
*million men / a million men / millions of men / a few million men

What do you think? - TAKASUGI Shinji 14:55, 21 December 2008 (UTC)Reply

Many cardinals also behave as nouns, forming plurals, being the object of prepositions, etc.. That is why we usually show them as cardinals and nouns. However, I see no reason to remove "hundred"'s classification as a cardinal. As for "dozen", any discussion belongs on its talk page or at WT:TR. Discussing general questions about the entries for cardinals would belong at WT:BP. DCDuring TALK 17:23, 21 December 2008 (UTC)Reply

(End of move)

From the comparison above, I'd like to classify hundred and thousand as nouns just like dozen, not as cardinals like ten, which can be indefinite determiners. There must be published linguistic analyses. Do you have any ideas? - TAKASUGI Shinji 00:16, 22 December 2008 (UTC)Reply
There is a problem in your table. The phrase "a score men" is not grammatical; it should be "a score of men". This is one difference which distinguishes a numeral from a collective noun like score.
A numeral (specifically a cardinal numeral) expresses a count and may function as either a noun or adjective in providing the count: "There were ten men." / "Ten were there.'" You cannot do this with (deprecated template usage) dozen: "There were dozen men." / "Dozen were there." Neither of these sentences is grammatical. So, I would tend to agree that neither (deprecated template usage) hundred, nor (deprecated template usage) million, nor (deprecated template usage) thousand is a numeral grammatically, since none of these words functions in such constructions: "There were hundred men." / "Hundred were there". These words are nouns only. However, when used as "a hundred" or "one hundred", etc. these words become part of a compound numeral word. --EncycloPetey 01:07, 22 December 2008 (UTC)Reply
I've done some additional thinking and poking into grammars. It is possible to use (deprecated template usage) hundred, (deprecated template usage) thousand, etc. as numerals in a limited way. Specifically, they seem to work as numerals so long as they are preceded by a determiner, such as an article (definite or indefinite), a numeral, a demonstrative, or an indefinite.
(with the indefinite article) There were a hundred people present. / A hundred were present.
(with the definite article) There were the hundred people present that we had expected. / The hundred were present.
(with a numeral) There were one hundred people present. / One hundred were present.
(with a demonstrative) There were these hundred people present. / These hundred were present.
(with an indefinte) There were some hundred people present. / Some hundred were present.
Based on this, we could call these words both numerals and nouns, with Usage notes included to explain their limited functioning as numerals. --EncycloPetey 06:47, 22 December 2008 (UTC)Reply
Thank you for your reply. I have edited the two articles. Please check hundred#Usage notes and thousand#Usage notes. - TAKASUGI Shinji 03:25, 24 December 2008 (UTC)Reply
dozen cannot take the -th derivational suffix unlike the others. you might also want to make a note about googol being even more nouny than dozen. Ishwar 14:56, 21 January 2009 (UTC)Reply

Page titles for phrases needing referents

I've come across a problem with translations of certain words into Irish. There are certain phrases in Irish which include a referent in the middle of the phrase. Examples are that, where the phrase is an ... sin, and thirteenth, where the phrase is triú ... déag. For the first, removing the referent changes the meaning; for the second, I've simply never seen it happen. Is there a convention currently in place for dealing with this in translations and page titles or can one be created? Many dictionaries use ~ to stand in for the headword in their entries. I wonder if we could use it to stand in for a generic referent. Using generic language-appropriate words, of course, changes the meaning. an ceann sin is that one, instead of that .... Suggestions very much appreciated. —Leftmostcat 21:57, 30 December 2008 (UTC)Reply

I like the idea of using the '~' as it is a character that has fewer other uses than most, both in normal text and on computers. Perhaps you could create an entry at triú ~ déag and see how it goes? Conrad.Irwin 14:46, 1 January 2009 (UTC)Reply
We have thus far kept the mainspace fairly clear of entries containing placeholder symbols (as opposed to placeholder words) like "~" and "..." and "X". IMO this is good because it keeps us from fragmenting content in opaque ways. I think something like "triú ~ déag" would be acceptable if there are no other options, but I would really prefer if we could find another way...
Where "foo bar" is different in meaning from "foo ~ bar", as I guess is the case for an ~ sin, could we have two separate inflection lines within one POS section? We use this approach for some other, very different, situations. Then one inflection line could present the "foo bar" form, and one the "foo ~ bar" form.
Where only the "foo ~ bar" form exists, I don't see why we can't just put the entry at [[foo bar]], and explain the usage through examples and usage notes. -- Visviva 15:04, 1 January 2009 (UTC)Reply
Something about this solution seems off to me, though I'm having difficulty putting it into words. For one thing, this doesn't seem to me to be any less opaque—simply less spread out. It also seems to lump together content which doesn't follow our convention for lumping together. This seems like kind of a weak argument against, but it just seems like a different but no more acceptable solution to the problem. The point of posting was to maybe come to an agreement on a standardized way of separating this content correctly so that the opacity becomes less of a problem. —Leftmostcat 06:59, 2 January 2009 (UTC)Reply
Question: Someone who knows Gaelic, knows the word triú ... déag, and wants to look it up in enwikt will look where? The answer to this question is not necessarily where we should have the entry, but is likely.—msh210 20:52, 1 January 2009 (UTC)Reply
This is an interesting question, and one I don't have a ready answer for. Neither of the solid online dictionaries for Irish right now are completist in the way that Wiktionary is and neither have entries for either "thirteenth" or "that" directly. One does have an entry for "that one", as an ceann sin. I don't think that I'd search for that, personally. That said, I don't think I'd search for either tríú déag or tríú ... déag either. If anything, I would search for tríú and déag separately. This brings up a possible answer, though: it's possible that these phrases can somehow be considered SOP. tríú is third and déag is similar to -teen. an ... sin seems a bit less clear-cut. an here is the, sin something like that. Still, I suppose this can be adequately explained in the entry for sin. That seems to clear up the problem for those two phrases and a number of related phrases. I'm not sure, however, if that means the problem goes away. It still seems feasible that the question could come up again. I just don't know if that means it's worth continuing this discussion.
To me, this leaves another sort of problem. In translation lines, this can be "t-template ... t-template" but this reduces our ability to do any sort of automatic processing on the result. Most automated systems would probably see this as "tríú" and "déag" being provided as translations for "thirteenth". I wonder if we could standardize or even template this sort of situation so that automatic processing is eased. —Leftmostcat 06:59, 2 January 2009 (UTC)Reply

New context tag and category for copulas

At WT:TR#awful two have thought it useful to have a category for verbs that functioned as copulas for at least one of their senses. I thought it might be useful if there were a context tag that applied to the senses that the verb had when functioning as a copula. At least in UK grammar schools, "linking verb" seems to be the common terminology. Should be use "copulative" or "linking verb" as a tag. Either would seem to require a link to Appendix:Glossary or to the entry.

w:Copula (linguistics) and w:List of English copulae are useful background. Does anyone have further thoughts on the subject? DCDuring Holiday Greetings! 19:19, 31 December 2008 (UTC)Reply

Personally I prefer copula / copulae. The term "linking verb" is used in US education, but primarily in the lower grades and without much explanation about what "linking" means. Given the decline in grammar education, either "linking verb" or "copula" would be equally opaque to most Americans. The term copula is more likely to have cognates in other languages, and therefore more useful for our non-English users. The term copula is also more generally useful, since the description "linking verb" refers to a sentence position such verbs do not always take, even in English. For example, the sentence "I want to know what the problem is." places a copula at the end of the sentence. --EncycloPetey 22:55, 31 December 2008 (UTC)Reply

For some words in the WP list (eg, "He acted happy".) there seems to be a distinct meaning for a sense that is "copulative". For others it seems to be an optional part of a few of the meanings (eg, "He arrived famished in New York at 10pm."). DCDuring Holiday Greetings! 23:26, 31 December 2008 (UTC)Reply

It seems to me that the latter situation is limited to adverbial use. So in the sentence "He acted happy," the adjective happy is used adverbially to describe how he acted. The same is true of the other example you've given. So, I'm not sure there is a distinct sense in each case, but there does appear to be a grammatical context difference. --EncycloPetey 11:24, 1 January 2009 (UTC)Reply
I am not sure about many of the 37 verbs that appear on the WP list of English copulae. However, "appear" is one that:
  1. seems to take a wide range of subjective complements and
  2. appear on many lists of copulae.
If one can deem any subjective complement to be an adverb, then there is no point to the exercise. How would one determine when happy#Adjecive is being used as happy#Adverb?

I note that there are many lists of copulae that claim to be "fairly complete", but don't have the same members.

I also am curious about those verbs that have objective complements ("You make me happy to be alive.") and reflexive complements ("He drank himself sober"; "They laughed themselves silly.") Our entries sometimes don't even have senses that I can effortfully construe with an adjective though I know the construction exists. Is there a label for such verb constructions and the verbs that participate in them. At least one grammarian included a few verbs that take objective complements in his list of copulae. DCDuring Holiday Greetings! 12:15, 1 January 2009 (UTC)Reply

Each of these will need to be considered on the merits, which means we will need to give some thought to what sort of tests there can be for copularity. For "appear", I think the distinction is fairly clear -- "he appeared happy" does not mean "he appeared in a happy way" but rather "he appeared (to be) happy". For "arrive", I'm inclined to agree with EP; the verb is not really attributing anything of the subject... "act" is a little trickier, as I would normally say that "he acted happy" means "he acted (in such a way as to seem to be) happy", not "he acted in a happy fashion/while happy". Perhaps that is still copular -- it certainly doesn't seem adverbial -- but it is a little different from words like "seem" and "become".
On a somewhat unrelated note, I am concerned that copula and friends are being made to carry a lot of content that should properly be placed in a grammatical appendix; entries should reflect only how a word is used, regardless of whether that usage is correct or not. -- Visviva 15:45, 1 January 2009 (UTC)Reply
The substantive value of this is to amend or add to our definitions and usage examples to reflect instances of copulative-type use where this is missing or unclear, which is surprisingly often. I thought that the basic test is whether the verb in the usage under consideration can take an adjective as subjective complement. (Nouns seem easier to confuse.) We might want to exclude those where the usage is at present limited to only a small number of adjectival possibilities. But this would possibly make Wiktionary less useful as an aid to reading works written in earlier Modern English. I don't see any particular "bright line" to distinguish copulative verbs from intransitive-verbs-that-take-subjective-complements-that-look-exactly-like-adjectives-but-must-be-adverbs-because-the-verb-isn't-copulative.
I think that in "John arrived hungry" "hungry" is clearly attributed to John and not his arriving. The question might be whether "arrived" plays an essential role in the attribution. This is perhaps the weakest of the purported copulae. It just seems quite arbitrary to deem all the adjectives that can follow "arrive" to be adverbs. Further, consider: "Come hungry." "The workers fell idle at the start, sat idle, remained idle, grew increasingly concerned, and left worried that the company would go broke." "The ingot glowed orange." "Loyal IBMers bleed blue." "Tom tested positive." "He pleaded guilty." "The carrots run small this time of year." DCDuring Holiday Greetings! 16:37, 1 January 2009 (UTC)Reply

January 2009

Questions concerning the use, naming and placement of inflection, declension and conjugation templates for FL languages.

A couple of years ago there was a flurry of criticism against inflection templates such as the Swedish ones as they does not clearly enough separate "inflection line templates" from "declension templates"[1]. I really wouldn't have bothered about it had it not been that I too find the Swedish template names (and parameter use!) somewhat less intuitive than they would have to be, to put it mildly. Hence, I have for quite a while been thinking about how to rectify the situation, but I am realizing that there are nowhere any clear directions on how FL inflection/declension templates are to be treated, what kind of structure I should aim for. And as I don't want to do this more than once, I would like some indication which direction I should aim for before I make any more work on the upgrades and reworkings of the Swedish templates. Important to point out is that Swedish uses a relatively low number of forms: up to: 3 for the adverbs, 8 for nouns, 13 (or 17, if one opts for better comprehensiveness) for verbs and up to 14 for adjectives. Still, that is mostly too many to fit in the inflection line (though one could make a selection of forms to present on such a line). Hence, a few questions (but first some observations):

  • At present, I see:
    1. an "infl-line"-solution, exemplified by the {{infl}}-template and the various English templates.
    2. an "infl-table"-solution which uses the whole page's width. This is also used by the same English templates, but is by default hidden from the user (I found out now; I had forgotten that I had edited my .css to see them.)
    3. a right-floating table, used by all Swedish templates (these are the ones which were criticized, back then) and German declension templates.
    4. a table under a heading of its own, mainly used by languages with a very large number of different forms.
  • WT:ELE only, as far as I can find, mentions case 1 and 4, though presumably 1 and 2 could be considered equivalent from that point of view.
  • My questions would then be:
    1. Should one always present a few forms on the inflection line?
    2. Should one retain the right-floating tables? (Or should one use an ====Inflection==== header no matter how little additional information, compared to the inflection line, would ever be present there?)
    3. If the answer is 'yes' to both questions, could one then use the same template to display them both? (So that one doesn't end up as in Apfel with two consecutive templates doing IMO very similar things.) However this would "blur the separation of the two template types". [1]
    4. When it comes to naming: *if* all should be used, the inflection line templates should merely be {{sv-<PoS>-<class>}} and the inflection/declension/conjugation *table* templates should be {{sv-<decl>/<conj>-<pos>-<class>}}, right? Conj for verbs and decl for nouns and adjectives?
    5. How many forms would be acceptable to squeeze into one inflection line?
      I think {{en-verb}} displays up to 5; I also think 8 would be too many to fit in, so perhaps one could restrict to 3 adverbs (that's all of'em) and adjectives, up to 4 nouns (that would mean 1,2 or 4 additional forms in the declension table whereever one would put that - right-floating or under a header) and finally 5 verb forms (which means that for a number of verbs the conjugation table would present up to 2 (or 4) new forms, which wouldn't be present on the inflection line already, the rest would present up to 8 (or 12) new forms.)
    6. Finally, what is really the intention of the css classes infl-inline and infl-table in the context of an FL? Are they intended to be useful at all for any language other than English? Could they be made useful by making one of them hide/display the more extensive declension/conjugation tables?
      Part of the reason I ask this is because as I have tried to sketch here, I don't think it is very useful for anyone, really, (when it comes to Swedish) to see *both* the brief inflection line information and the complete inflection table - either you need all the information given in the table *or* it is sufficient for you to see the gist of the inflection pattern, as given in a inflection line. My first idea was thus to give both the inflection line and the inflection table, *but* that's no good as it is quite difficult for the average reader to make wiktionary display the complete tables (.css editing seems to be a must?) Or should I leave those classes to the English templates and, say, create another solution for hiding the complete tables by collapsing them? Would that be an acceptable compromise even if they are still right-floating?
    • Now, if the answer to number 2 is 'no', then I guess one would have to return to the old situation where the Swedish entries "all" used an ====Inflections==== header. Personally, I'm not very fond of this, as it definitely would require the entering of two templates + one extra header almost every time - and then have very little information below it. Presumably one could avoid it for the adverbs (only 3 forms), but that would be a very small benefit. The second possibility - to skip that section for uncountable nouns, proper nouns and periphrastic or absolute adjectives - would add the extra drawback of making it inconsistent within the language+PoS-combination. Else the (hypothetical class name, I haven't got around to really decide on the updated format of the noun templates yet) {{sv-noun-unc-n}} and {{sv-noun-decl-unc-n}} would display the same 4 forms, but in different layouts and at different places.
    • Of course one could argue that one could use different solutions for different PoS's, but I am doubtful that it would benefit the reader to find a single language's inflection information at various places in the entry for different entries. If s/he is used to find it under an Inflection header and constantly finds it there for adjectives, verbs and nouns, why should s/he look elsewhere just because it's an adverb? No, I strongly think that any solution will have to be as consistent over the entries of the language as possible.

I hope this discussion will yield something tangible for a Wiktionary:Conjugation and declension templates page to go with the Wiktionary:Inflection templates page, linked from the ELE, so that users who wants to add to the infrastructure of inflection/declension/conjugation templates for new languages could see what's expected of them and their templates.

Thanks for your patience with my (almost) never-ending writing... :P \Mike 23:12, 1 January 2009 (UTC)Reply

For languages and POSes where there are too many forms to put them all on the inflection line, but not too many forms to fit in a right-floating table, I think the right-floating table idea is a good one. But, it shouldn't replace the inflection line, which all of our entries have. For example, I think yttrandefrihet looks good.
Personally, I don't care too much whether the inflection line is built into the template that generates the table, or created using {{infl}}. If the latter, then {{sv-noun}} should be renamed to {{sv-decl-noun}} or something. (The name "sv-noun" makes it sound like it creates the inflection line, like with {{en-noun}}, {{fr-noun}}, and so on.) —RuakhTALK 20:30, 2 January 2009 (UTC)Reply
I dislike the right-floating boxes, and don't think they should ever be built into the inflection line. All too often, images need to be placed in a POS section, and these are typically placed ahead of the inflection line. This causes severe problems if there is a right-floating inflection table competing for that spot. I also constant run across such pages where the right-hand table extends down into the following language section, which is visually confusing. I prefer to always see inflection included either on the inflection line when there is very little to display (more than 4 or 5 items becomes messy), or else in an Inflection / Declension / Conjugation section. Including the explicit section with header puts additional information into the page's TOC, which can be very helpful on long pages with multiple languages.
"Should one always present a few forms on the inflection line?" I think so in most languages and situations, but only if it either (1) helps to summarize the forms or (2) is a complete and concise listing. Some other users here disagree, and would rather that no inflected forms appear on the inflection line in certain languauges, for reasons I understand but don't necessarily agree with for those languages. There are some languages (e.g. Japanese) where only forms in other characters are given on the inflection line, and some languages (e.g. Polish) where only the gender of nouns is given, and some African languages where the class of nouns is given, but not the inflected forms. I'm not sure that a general agreement can (or should) be reached that applies uniformly to all languages, except perhaps as a principle of "make the inflection line a summary". --EncycloPetey 23:48, 2 January 2009 (UTC)Reply
I generally agree with EP on this. Right hand templates are completely unacceptable, and I very much want to see them all go away. They so often get in the way as EP notes. I think that if a word has four or fewer forms, putting them all in the inflection line is fine (so adverbs in this case probably don't need a dedicated inflection table). Anything more should be in an inflection table under its own header (there's debate between inflection and declension/conjugation. I prefer the former, as I think it is more workable and I think the latter makes an unnecessary and sometimes difficult distinction, but others disagree). I think that inflection templates should always be collapsable (again, others disagree with me on this, but I have yet to be swayed by their arguments). Then again, I feel like everything except language, part of speech, and definitions should be collapsable, but I may be drinking at my own party on that one. Anywho....the more I think about it, the more I am beginning to think that any entry with a dedicated inflection template should have zero inflection data inside the inflection line, however this is basically the opposite of current practice in most cases. I think a good rule of thumb for current practice in inflection line use would be, a quick and dirty version, often mimicking what traditional dictionaries put in their entries (e.g. Ancient Greek nouns have genitive singular next to nominative singular here, which is basically what all paper grc dictionaries do). So, to make sure I answer all your questions:
  1. Not necessarily, but most entries currently do.
  2. Right handed tables are a bad idea and must go.
  3. See above, but if right-handed tables must be kept, they need to be more flexible than this, so they need separate templates (although it might work to have the option of having the inflection line template be able to call the full inflection template).
  4. Yes.
  5. Four, at most five.
  6. I don't understand CSS enough to really answer this. However, I might note that in the latest incarnation of grc inflection templates, the basic info is presented in the collapsed version, basically mirroring the info presented in the inflection line.
Hope that helps. -Atelaes λάλει ἐμοί 07:47, 3 January 2009 (UTC)Reply

Broken templates

Someone has recently changed the templates for tenses so they no longer include a link around the target word. Is this correct? Sometime ago I was admonished for entering some without this link.

Also they now expand incorrectly, no longer having the '*' at the beginning. See insnares, insnaring, and insnared which I just entered using these (broken?) templates. - dougher 02:38, 2 January 2009 (UTC)Reply

These templates seem to work fine - unless you are referring to a template that is not on the page? The format of these "Form-of" pages varies wildly depending on which language you edit, but they generally should look something like the following to allow for a reasonable amount of consistency:
==English==
===Verb===
verbing
# {{present participle of|[[verb]]}}

If you don't want the hassle of typing this every time, you can enabled Accelerated creation by ticking the appropriate box on WT:PREFS. Conrad.Irwin 02:43, 2 January 2009 (UTC)Reply

I was using the buttons on this page http://en.wiktionary.org/wiki/Special:Search?search=alsdfjklajf&go=Go. They have worked fine forever. If the policy is now to use the Accelerated creation thing, shouldn't this page be fixed?

All those missing parts of the insnare pages (English, Verb) that you point out are missing are normally created automatically by clicking on the red links for the missing tense pages -- what happened to those red tense links, they're broken too? - dougher 02:52, 2 January 2009 (UTC)Reply

Those buttons work fine for me (as in they create the parts that were missing from yours), so I don't know what the problem was for you - maybe you just hit them on a bad day for the software. Which red tense links are you talking about? Conrad.Irwin 15:53, 2 January 2009 (UTC)Reply

"singular delative of" or "delative singular of"?

What is the recommended order of words in form of entries for nouns? First the singular/plural, second the case name? Or the opposite? --Panda10 23:27, 2 January 2009 (UTC)Reply

For Latin, I have consistently been using the sequence: case, gender, number for adjectives, and: case, number for nouns. The case has more variability in most languages, and is more often important for translation, so listing it first gives it the extra attention to help our users. --EncycloPetey 23:36, 2 January 2009 (UTC)Reply
Hmmm...I have been using (gender) case number.......so, we're agreed on nouns at least. :-P -Atelaes λάλει ἐμοί 19:13, 6 January 2009 (UTC)Reply
I just tried google:"the nominative masculine singular" and the other five permutations, then likewise for google:"the accusative feminine plural", with the following results:
nom. masc. sing. hit counts acc. fem. pl. hit counts
c. g. n. 1460/83 9
c. n. g. 2000/109 7
g. c. n. 1140/124 109/21
g. n. c. 279/67 58/15
n. c. g. 104/18 2
n. g. c. 8 4
(where foo/bar means that foo was Google's first-page estimate, but bar was its last-page estimate; I'm assuming bar is more reliable). All told, it looks like some orderings are better preferred than others, but we have some flexibility. Going just by the numbers, I'd definitely use gender-case-number (as Atelaes suggested), but we don't have to go just by the numbers if we think EP's reason for case-gender-number is a good one.
RuakhTALK 21:35, 6 January 2009 (UTC)Reply

Category for possessive noun forms

Where should we put Hungarian possessive noun forms, e.g. házam (my house)? In Category:Hungarian noun forms or in a separate Category:Hungarian possessive noun forms? --Panda10 19:29, 4 January 2009 (UTC)Reply

For Hungarian, the latter is probably appropriate, since there is a whole series of additional suffixes used to create the possessive forms. However, I can't imagine there would be much value in having that separate additional category. --EncycloPetey 19:38, 4 January 2009 (UTC)Reply
Are you saying someone should or shouldn't create the separate category (IMO however, it should be a subcategory of Category:Hungarian noun forms)?50 Xylophone Players talk 19:44, 4 January 2009 (UTC)Reply
I'm leaving the question open of whether to do this. I wouldn't normally want to see such a category, since in most languages it wouldn't be worthwhile, but Hungarian is unusual in this regard, and may have reasons for such a category. --EncycloPetey 23:21, 4 January 2009 (UTC)Reply
At this point we have not come up with a useful way to categorize inflected forms, and the only strategy which has been adopted with any regularity is to relegate them all to "Category:language noun forms", which can become utterly monstrous in highly inflected languages. So, I would advise being adventurous in this sort of categorization until we find a useful standard. However, I would advise using templates, so that these many forms can be easily categorized in the future, should it be deemed necessary. -Atelaes λάλει ἐμοί 23:27, 4 January 2009 (UTC)Reply
I like the idea of a template in the inflection line which would put the possessive noun form into a category. What would be a good name for the template: hu-noun-posform? --Panda10 23:34, 4 January 2009 (UTC)Reply
That seems reasonable to me. -Atelaes λάλει ἐμοί 23:38, 4 January 2009 (UTC)Reply
I'd recommend hu-noun-form-poss(essive) instead, since it keeps the base template name. --EncycloPetey 23:45, 4 January 2009 (UTC)Reply
By the way what about Category:Hungarian noun forms? Should it perhaps be split into separate categories —one for each case, maybe— or what? As it stands it seems to be 34 form ofs (going into this category) per lemma. Also (I've been wondering about this for a while) why is the essive-formal line always blank? 50 Xylophone Players talk 00:38, 5 January 2009 (UTC)Reply
It is not always empty. See . If we go for a separate category for each case, we need another template to handle it. --Panda10 00:45, 5 January 2009 (UTC)Reply
Ultimately, the most useful, long-term solution is this: Have two templates: One template for the inflection line, and one for the definition line. The inflection template does little more than bold the headword, and the definition template works similarly to {{inflection of}}. Have the the definition line template take parameters such as n, m, s, e, etc. (make it work for Hungarian, however you think best), and spits out "nominative singular of x" or whatever. Use them both in all inflected form entries. Thus, if we decide we want all inflected forms in a single huge cat, we can turn categorization on in the inflection line template. If we decide we want more specific categorization, we can turn inflection off in the inflection line template, and on in the definition line template. Since the definition line template will have all the information it needs to do any sort of categorization, we can adjust it all at the template level. Maybe we want all nominative singulars in a specific cat, and genitive singulars in another; the template can do that. Maybe we want all inflected forms from a specific lemma in a "Category:inflected forms of lemma" category; the template can do that. Maybe we want both; it can still do that. -Atelaes λάλει ἐμοί 00:57, 5 January 2009 (UTC)Reply
Sorry, I meant essive-modal, but anyway why is it often blank and why does have no essive-modal plural? 50 Xylophone Players talk 03:10, 5 January 2009 (UTC)Reply
PalkiaX50: Not every noun can use the -ul/-ül case ending, and even if it is used in singular, it may not make sense in plural.
Atelaes: Thanks for your suggestions, I will work on it, it may take a few days, though. --Panda10 12:51, 5 January 2009 (UTC)Reply
I created two templates, please see bankot:
  • {{hu-noun form}} - displays the bolded page name; it should be used in the inflection line
  • {{hu-inflection of}} - dipslays the definition (e.g. accusative singular of <lemma>); it does not handle possessives, only the regular cases
Should I worry about the lack of wikilinks on the noun form page? I think AF is handling them lately, correct? --Panda10 23:52, 5 January 2009 (UTC)Reply
Personally, I think it would work better if you made these changes:
  1. Don't make it automatically generate a link to "nom. sing.#Hungarian; that way you could type {{hu-inflection of|[[bank#Hungarian|bank]]|acc|s}} without getting a truly ugly result hence keeping the page looking perfect and counted in the statistics.
  2. It would be nice if there were more shorthands, e.g. for superessive, essive, translative, etc. 50 Xylophone Players talk 01:00, 6 January 2009 (UTC)Reply
Disagree with PalkiaX50 on 1, per Wiktionary:Page count. Agree with 2. -Atelaes λάλει ἐμοί 04:15, 6 January 2009 (UTC)Reply
With regard to 2; the template {{inflection of}} deliberately does not use additional shorthand because they are not as generally applicable across languages, but for a language-specific template like the new {{hu-inflection of}}, there should be real benefit in having more shorthand parameters. I strongly suggest, from experience, that you carefully plan out the shorthand coding. You want to be sure they're easy to remember, don't cause confusion by seeming to be short for something different, and (ideally) be a fixed number of characters (3 or 4) to make them easier to remember. It may not be possible or feasible to meet all of these criteria, but I mention them in case you hadn't yet considered them. --EncycloPetey 05:56, 6 January 2009 (UTC)Reply
Thanks for the suggestions. I created a table with the proposed shorthand for each case. Please take a look at User:Panda10/Inflection. I am planning to create categories for each case. This will help identify errors (entries that were defined erroneously and have to be corrected). Example category name for accusative: Category:Hungarian noun forms - accusative - will this work? These categories would be under Category:Hungarian noun forms. --Panda10 23:26, 6 January 2009 (UTC)Reply

Historical/ Legendary Fictitious Proper Nouns

Are proper nouns that appear in Historical literature or legends (i.e. The legendary sword Hrunting in the elegy of Beowulf) potential candidates for word entries? Just wondering. --Dictionman 22:40, 6 January 2009 (UTC)Reply

Yes, for a couple reasons. Famous, notable works, such as Beowulf get special privileges as far as citations go, and so a single cite from Beowulf (and perhaps a notable English translation of it) would suffice to justify inclusion. Thus, the Old English word absolutely, the English equivalent probably. -Atelaes λάλει ἐμοί 22:57, 6 January 2009 (UTC)Reply
I agree that we should have (deprecated template usage) Hrunting, but I don't know if I agree that appearance in a well-known work, taken alone, is enough justification for that sort of proper noun that refers to a unique entity. Appearance in a well-known work counts for attestation, which means that actual words from Beowulf should be included, but I don't know that it just wipes away all the other CFI. :-/   —RuakhTALK 00:19, 7 January 2009 (UTC)Reply
My mistake in the title, I meant my request to go only for legendary and renwoned widespread proper nouns solely, not words is general. So, may I have permission to create an article for Hrunting and Naegling (Beowulf's other sword)?--Dictionman 00:32, 7 January 2009 (UTC)Reply
I have serious reservations about this; IMO entries of this type should be restricted to either a) proper nouns that have entered literary lexicon as bywords for something or other, or b) proper nouns occurring in some closed set of works that we can all agree on (good luck...).
I note that Wikipedia has articles for w:Hrunting and w:Naegling, as it should. Any etymological/philological scholarship that has been done on these names can surely be summarized in the Wikipedia articles. Since not even "Hrunting" seems to have entered the modern lexicon in any significant way, I'm not sure what we could add to the information these entries provide. -- Visviva 02:09, 7 January 2009 (UTC)Reply

Danish nouns

I've been thinking about changing {{da-noun}}, and made a proposal in User:Leolaursen/da-noun, with some examples of use in User:Leolaursen/sandbox. Is this a step in the right direction? and is it worth the temporary breakage of about 1000 pages?

Also i attempted to prepare for the use of Accelerated, so is this done the right way? – Leo Laursen – (talk · contribs) 18:47, 7 January 2009 (UTC)Reply

The new template is missing the plural definite, is that deliberate? The Accelerated creation should now work for your new template (sorry about the wait), thank you for adding the class names yourself. Let me know if it's not perfect. Would it be possible to fix the 1000 pages with a robot? Otherwise it's a lot of work for a human. Conrad.Irwin 09:57, 9 January 2009 (UTC)Reply
The plural definite is left out on purpose, because it is only an added "-ne" in the majority of cases, and the few exceptions will be shown in the inflection table (please see User:Leolaursen/da-noun-infl).
I don't see green links (yet?), but accelerated is enabled and works for eg. Dutch nouns.
Using a bot might work, taking arg. 3 and 4 unchanged, and either finding gender from arg 1 (en/et) or keep g. – Leo Laursen – (talk · contribs) 11:34, 9 January 2009 (UTC)Reply
The bot is not needed, I made a wrapper, so the old version is called, if the fifth argument exists. Accelerated, unfortunately does not work. – Leo Laursen – (talk · contribs) 10:34, 10 January 2009 (UTC)Reply

Template:SI-unit

It's nice that this template standardizes fromat across the several SI measures where it's used. However, it gives all numerical quantities in scientific notation, which I can say from experience the majority of Americans don't understand. Using this template, a (deprecated template usage) decisecond is defined as "An SI unit of time equal to 10−1 seconds", when saying "one-tenth" or "1/10" would be understandable by a far greater percentage of our users. Is there a simple way to modify the template to provide a parenthetical English or fractional value? --EncycloPetey 00:17, 8 January 2009 (UTC)Reply

User talk:Visviva/SI would be one way to do it; not sure what the best layout is. Maybe the scientific notation should go in parentheses? -- Visviva 09:31, 8 January 2009 (UTC)Reply

Proposals for Philippine languages

I think this is the best place to start asking for advice and permission. So, I shall now ask about:

Templates such as:

1. Verb conjugation Typical Phil. verbs, particularly in Tagalog, are made from nouns attached by a patterned set of prefixes and suffixes. I see this as similar to the Spanish conjugation system--as these verbs have a table for each word, could Tagalog words have them too? The templates would be helpful for automatically listing the "verbed nouns" in one place, and in itself would portray how verbs work. Finally, it would be convenient for linking inflections onto their root words. Like in the Spanish verbs, the tables would be mainly on the root words, then stemming out for their individual meanings. Further information about Phil. verbs could have its Appendix page also.

2. Descendants Perhaps a template would work better here. The template words would be listed into the existing "(Phil language here) words derived from Spanish" pages.

And:

A talk page where plans for the Phil. languages could be, and where this may be best created in; I'm thinking it could be linked from Category:Languages_in_the_Philippines. It would be where someone willing to help would first look and see how his language is progressing. This would have a to-do list, discussion on other improvements, appendices that would be helpful for editing the Phil. languages, and anything that would help.

What do you think?

Absolutely. This is all good. Generally, every language is encouraged to have inflection templates set up, and if you don't feel up to the task of writing them, or need some tips on standard formatting for them, we're quite happy to help with that. I'm a little uncertain as to what you're talking about with number two. Certainly every Spanish word is allowed a "Descendants" header, where Filipino/Tagalog/whatever else can be placed. And certainly every word can have an etymology section, where its etymon/etyma can be listed. As for a central page, I suggest creating Wiktionary:About Filipino and/or Wiktionary:About Tagalog. What might be easier is to simply create an example page and we can work out formatting in a more concrete fashion. -Atelaes λάλει ἐμοί 07:09, 9 January 2009 (UTC)Reply
Thanks for responding, Atelaes! For the descendants, I always type the language and word, (==Descendants== * Filipino: manibela, something like this), and I thought the diff. Phil. languages could have their own "{{}}" thing to list the Spanish words down somewhere on another page. For now the list of Phil. language words from Spanish are composed of existing terms being in a category. I want the descendants to be listed somewhere before they're made. I suppose this is unnecessary, but I thought it would help to keep them standardized. The Sandbox seems like the perfect place for formatting. Thanks again! --Icqgirl 08:09, 9 January 2009 (UTC)Reply
Well, you're certainly welcome to write a bunch of Philippine words under the Descendants header of a Spanish word, even if they haven't yet been written. Also, if you like, you can write an appendix, such as Appendix:Greek words with English derivatives, but these are generally seen as a tool for getting data into entries, not as a final product. -Atelaes λάλει ἐμοί 08:17, 9 January 2009 (UTC)Reply
For a guide to adding Descendants, take a look at the Latin page for (deprecated template usage) vermis, which lists many Descendants in various languages. Note that there is a red link for Occitan because that entry has yet to be written. This is perfectly acceptable, and you can do the same thing. You will also notice that the Italian and Portuguese descendant words are the same, but that the {{l}} templates (that's a lower-case "L") allows each word to link directly to the corresponding language section of the target page. --EncycloPetey 18:36, 9 January 2009 (UTC)Reply

User:Conrad.Bot and Indices

It struck me that what was originally a way for me to upload indices without flooding recent changes is now more like a bot for generating indices. Are people happy to allow Conrad.Bot to continue uploading the Indexes for Galician, Hungarian, Italian, Irish and Spanish, or should I call a VOTE? Conrad.Irwin 02:24, 10 January 2009 (UTC)Reply

Go for it! —RuakhTALK 02:40, 10 January 2009 (UTC)Reply
The index in this format is extremely helpful. Thank you for doing this. One question: the main page of the Hungarian index is somehow left out of the refresh process, this is the page that contains the total number of entries in the index. Would it be possible to include it in the refresh process? --Panda10 02:49, 10 January 2009 (UTC)Reply
Yes, this now happens, but note the count has jumped up because it now indexes red-links from translation tables too. If you would actually like a count of the number of pages, I can probably generate this as well. Conrad.Irwin 02:06, 11 January 2009 (UTC)Reply
It would be nice to have all the information. Would this format (or similar) make sense to you: "The 10773 terms (4000 red links, 6773 blue links) on this page..." --Panda10 13:12, 11 January 2009 (UTC)Reply
Likewise, this is page maintenace, not the creation of new material or entries. You might go for a formal vote if you plan to expand to additional languages (which would be nice), just to put the Bot on the books, so to speak. And if expansion can be done easily, then it would be helpful to have a designated place for specific language requests, especially for helpful notes on odd characters and what alphabetical order means in that language. I'd be particularly interested in seeing additional Romance languages indexed, like Catalan, Occitan, Asturian. --EncycloPetey 03:00, 10 January 2009 (UTC)Reply
I am happy to extend this, and, thanks to recent restructuring it is much easier to extend. If you direct requests to my talk page it is most likely to be noticed, but I will (at some point) set up User:Conrad.Bot/Indexing giving information about what goes on (though there is so much mucky code loitering around this process that I'd rather not upload all 15 scripts that get involved - that would probably be less use than a human-readable description anyway). Conrad.Irwin 02:06, 11 January 2009 (UTC)Reply
What happened to grc? We had all the rules worked out well, and then you just dropped me, like a cheap hooker. I'm hurt. :-P -Atelaes λάλει ἐμοί 05:26, 10 January 2009 (UTC)Reply
It's now there(ish), I'm afraid I've been irresponsible and not left myself enough time to fix the header, will do that in 10hours or so. See you later. Conrad.Irwin 11:19, 10 January 2009 (UTC)Reply
'Tis very much appreciated. My self worth is now a great deal more secure. Thanks. -Atelaes λάλει ἐμοί 05:04, 11 January 2009 (UTC)Reply
Like Ruakh and (resp.) EP, I, too, would like to see this continued and (resp.) extended to other languages. All languages, in fact; why not?—msh210 21:31, 12 January 2009 (UTC)Reply
It seems to me that there's fairly strong support for this from the community and no opposition. I suggest that Conrad should feel quite free to create indices for any languages he has the motivation for. I would like to eventually see all languages have indices which are all updated on a regular basis, but clearly that's a fairly massive undertaking, and will probably take time. Also, different languages will have different sorting criteria (as I'm sure is Conrad already aware). In my opinion, an imperfect index is better than no index (and is more likely to provoke discussions leading to a perfect index). Since this is the creation of a series of pages, and not a change in policy, I see no need for a vote, but if Conrad would feel more comfortable with one, I'd be happy to create it. -Atelaes λάλει ἐμοί 22:25, 12 January 2009 (UTC)Reply
Ok, I'll continue to create indices. I'd prefer to be creating pages I know are being used, it means I can keep track of what's going on; so if you would like an index, please ask for it on my talk page and give me an indication of how sorting/splitting should work. I will get around to doing the English one when I've worked out how to split letters in half nicely (it's currently too big to fit on a single wiki page per letter for some letters as the software has excessive memory usage for rendering links). Conrad.Irwin 23:48, 13 January 2009 (UTC)Reply
Suggestion: find a print copy dictionary, especially the Compact OED, and count pages used for a particular letter of the alphabet. Then see what divisions happen near the place where that section is divided in half, thirds, fourths, fifths... or whatever is necessary given the quantity of data for that letter. As a rough guide, you could look at the navigation template at the top of Category:English nouns, but I don't know if that would be enough for dividing up the whole of our English entries. --EncycloPetey 20:30, 16 January 2009 (UTC)Reply

"Plural form of xx." vs. "nominative plural of xx"?

This was probably discussed before, but can we rethink the standardization of wording and punctuation of form-of lines? Is it all lower case, no period? Or should it start with a capital and end in a period? Should the plural contain "nominative", since normally a form-of entry would contain the case? Sometimes the different formats appear on the same page in multi-language entries and it doesn't look good. --Panda10 15:00, 10 January 2009 (UTC)Reply

Concerning the nominative: does it make sense to add that word for languages which doesn't separate nominative from other cases? No, I don't think one should specify that one has to give exactly "case and number" for noun forms; some languages won't see the point to include the case (which simply would mean to add the word "nominative" to every single inflected form), other would want to include more information (such as Swedish and the definiteness with the nouns). On the other hand, if your question is "Given a language with several cases, should one always specify the case, and not be allowed to use 'nominative' as default", then I agree. But on the other hand, I don't think that would do much for the consistent appearance of multi-language entries, would it? \Mike 16:24, 10 January 2009 (UTC)Reply
Actually, there are two questions: wording and punctuation. I agree with you that adding nominative to the plural form does not make sense in English. There is still the question of punctuation. --Panda10 16:33, 10 January 2009 (UTC)Reply
Yes, I silently skipped the punctuation issue as I don't really have a preference for how it should be done (well, short of "standardized") \Mike 17:14, 10 January 2009 (UTC)Reply
Standardarizing the capitalization and punctuation may not be possible, because many editors do not use the inflection line alone. Some add a translations in front of the "form of", some add it after, and some just rely on the link to the lemma. I prefer uncapitalized, no period, but I'm sure there are others who prefer a period and/or a capital letter at the start. The {{inflection of}} template can be made to capitalize (if desired) by explicitly adding the first descriptive word in capitalized form, if this is necessary for a specific page. However, it can't be set to automatically capitalize the first word because there is no way to tell which of various possible choices may come first in sequence. That is, the first word could be gender, case, number, or something else. We can't specify that one of these items always come first, because not all languages or words require that particular item to describe the language's grammar. This is why I prefer no capitalization; it eliminates a messy and unnecessary difficulty. Because it's not capitalized, and because there may be following parenthetical text, I prefer no punctuation either. --EncycloPetey 18:59, 10 January 2009 (UTC)Reply
Regarding "nominative": If it's specifically the nominative plural, then yes, it should say so. (The lemma in relevant languages is usually the nominative singular, so it's tempting to say that this is just the plural of the lemma, but that's misleading: it's actually the nominative plural of the word as a whole, and we just happen to use the nominative singular form to identify the word. Likewise, feminine singular adjectives should be identified as singular as well as feminine, and so on.)
Regarding punctuation: I prefer a capital letter and a period, but it doesn't bother me too much if other people do it differently. Consistency would be nice, but is probably too much to hope for.
RuakhTALK 20:59, 10 January 2009 (UTC)Reply

Spanish combined forms

I want to propose some guidelines for adding these. I'd like this to be somewhat more strict than the thousands of Italian combined forms that we have. First, require a quotation for every form. This will cut down on just adding them en masse without attestation. Second, require a general English translation. Also, would it be worth it to make {{compound of}} for entries? For an example, see aceptarla. Nadando 21:32, 10 January 2009 (UTC)Reply

We already require a quotation for every form, don't we? I agree that having an English translation is necessary, but if someone is only willing to add the entries without such, then I think that that's better than nothing.—msh210 21:27, 12 January 2009 (UTC)Reply

Dutch language categorization

The category Dutch adjectives contains random Dutch adjectives, and Dutch adjective forms contains the same. I'm proposing they are merged, I am not sure how Wiktionary works, but if it's similar to the off-site Wiki I contribute to, categorization in the 'parent' may cause issues displaying their subcats when full of entries themselves, I do not mind having the pages in the Dutch adjective forms category, but I think the two should at least be merged.
I'd also like to see Category:Dutch abbreviations, Category:Dutch initialisms and Category:Dutch abbreviations, acronyms and initialisms merged (since the latter suggests they contain the abbreviations...) unless, there is a reason why they're in the (near identical) different categories? -- 6Sixx 06:00, 13 January 2009 (UTC)Reply

The Category:Dutch abbreviations, acronyms and initialisms is the master category into which separate categories should exist for abbreviations, acronyms, and initialisms. This is done across all languages. The master category will only contain entries that have yet to be properly categorized, but the master category exists for both these uncategorized words and to group the subcategories. The contents of the subcategories will each be very different. --EncycloPetey 06:34, 13 January 2009 (UTC)Reply
That explains it; then shouldn't Category:Dutch abbreviations (etc) be inside Category:Dutch abbreviations, acronyms and initialisms? -- 6Sixx 12:41, 13 January 2009 (UTC)Reply
Yes, it should. --EncycloPetey 02:01, 16 January 2009 (UTC)Reply
From what I understand, Category:Dutch adjectives should contain lemma forms while Category:Dutch adjective forms should contain inflected forms. For instance, vriendelijk is a lemma form, while vriendelijke, vriendelijker, vriendelijkere, vriendelijkst, vriendelijkste are inflected forms; see also the table at the right of vriendelijk. --Dan Polansky 10:09, 13 January 2009 (UTC)Reply
Hmm I did not think of that. I think the naming could be a lot better, but I assume it is like that for consistency between languages... -- 6Sixx 12:41, 13 January 2009 (UTC)Reply

mots quotidienne

(Inspired by Visviva's analysing the print editions of the NYT, Guardian, and several journals, plus my need for some code for Wikamusi (sw.wikt), I wrote something to read the on-line articles from a number of FL print newspapers)

I have been creating lists of word in the on-line articles of print media in Italian, French, and Spanish. Several people have looked at the Italian so far. The idea is to collect the words that are appearing in the daily editions, but we don't have in the wikt. These are of increased interest either because of frequency or because of the current news, thus more likely to be looked up. It also finds words that appear to exist, but don't have needed language sections, e.g. horas has (as of this writing) no Spanish section.

These three languages work well because a large number of inflections exist; this wouldn't work as well for others. So the coverage is quite high (98+% for Italian, not enough data on French and Spanish yet).

See User:Robert Ullmann/Español, User:Robert Ullmann/Français, and User:Robert Ullmann/Italiano. Any feedback much appreciated. Robert Ullmann 17:57, 13 January 2009 (UTC)Reply

In French, c', j', m', qu' are contractions of resp. ce, je, me, que. There are also l', n', r', t' , I don't know if you remove them or there just wasn't any occurence the 12th.
(BTW, if the title is in French, there are some mistakes in it ;o) )
Koxinga 20:47, 13 January 2009 (UTC)Reply
I forgot to say that the end result is very good for French. Most of the words are real words, and useful ones at that ! Koxinga 20:49, 13 January 2009 (UTC)Reply
Out of curiosity, what's r' ? I've never seen it in French (but I'm definitely not an expert). Was it a typo for s'? Equinox 22:38, 13 January 2009 (UTC)Reply
Well, I forgot if I had any real word in mind at the time of typing, but considering s' isn't in my list, it may well be a typo. Let's forget about r and replace it by s' . Koxinga 23:30, 13 January 2009 (UTC)Reply
I put in a few simple contractions (that is why you didn't see l' for example, and I already had s') will refine the list. The section title is just j'amuse. Robert Ullmann 23:53, 13 January 2009 (UTC)Reply

Script template change may break old user styles

Please read this if you see a sudden change in font style for foreign scripts.

I've just removed some old class names from the style sheet MediaWiki:Common.css, e.g., .AR for Arabic, to be replaced by the ISO-15924-compliant .Arab. Everything should continue to work as before, unless you have old-style class names in your monobook.css. If so, you can restore the old behaviour by replacing them with new class names.

The affected classes are:

  • .AR → .Arab
  • .FA → .fa-Arab
  • .KS → .ks-Arab
  • .KU → .ku-Arab
  • .OTA → .ota-Arab
  • .PA → .pa-Arab
  • .SD → .sd-Arab
  • .UG → .ug-Arab
  • .UR → .ur-Arab
  • .HY → .Armn
  • .BN → .Beng
  • .RU → .Cyrl
  • .EL → .Grek
  • .scHebr → .Hebr
  • .KM → .Khmr
  • .LO → .Laoo
  • .TE → .Telu
  • .TH → .Thai

Let me know if there are any problems. Michael Z. 2009-01-13 22:07 z

Mystery editor

I recently received a message on my talk page from someone who claims that he created a number of Min Nan entries before being blocked. The account is User:Sven70. Can anyone help shed light on the issue. Specifically, which Min Nan entries should I be looking at? Thanks. -- A-cai 23:14, 15 January 2009 (UTC)Reply

Take a look at Special:Contributions/Sven70. The edits were all reverted, and no L2 was ever introduced. -Atelaes λάλει ἐμοί 23:24, 15 January 2009 (UTC)Reply
Looking at his deleted contributions is far more instructive. When blocked, he continued to make garbled edits and to post hate comments under a variety of IP addresses. He is also permanently blocked on Wikipedia (within three days of his welcome). --EncycloPetey 02:00, 16 January 2009 (UTC)Reply
Looks as though he's acquired speech recognition software, and has made great progress towards using it. --EncycloPetey 17:18, 19 January 2009 (UTC)Reply

enPR renamed unilaterally on English Wikipedia

I have posted to WT:ANI regarding admin abuse over the unilateral renaming of the enPR material on Wikipedia as "non-controversial". --EncycloPetey 09:42, 16 January 2009 (UTC)Reply

The incident was closed without action. So, Wikipedia no longer has a page about enPR. --EncycloPetey 20:43, 17 January 2009 (UTC)Reply

jocular or humorous?

I only recently noticed that {{jocular}} had been redirected to {{humorous}}, although this actually occurred several months ago. This bothers me somewhat. I understand that "humorous" is a more common word, but it differs in its primary meaning, being oriented to the reader/listener rather than the author/speaker. In this regard, I'm not sure we should be tagging usages as "humorous" at all, since few things are more variable from one person or context to another than whether something is perceived to be funny. ... If I use a jocular insult that results in my being stabbed to death, it is fair to say that my use of the term was not at all humorous, but it was jocular all the same. ... If "jocular" is too obscure, I wonder if we could perhaps use a closer synonym, maybe {{joking}}. -- Visviva 10:54, 16 January 2009 (UTC)Reply

I beg to differ: given that _all_ tags are (theorically) oriented toward the speaker rather than the listener, there is no ambiguity, and there is no case to be made at all that a word can have one, but not the other tag. Furthermore, both words explicitly use the other in their definitions, which one would tend to consider confirms the synonymy. Circeus 19:50, 16 January 2009 (UTC)Reply
I disagree. Many of the context tags provide information about how a word is used or intended, while others provide information about how it is received. A tag that says (Australia) indicates that a person using the term is likely to be Australian, and says nothing about the person reading or hearing the word. A tag that says (proscribed) indicates something of how the word will be received, and advises a potential user.
In the case of jocular vs. humorous, I agree with Visviva that there is a difference in the shades of meaning between the two terms. The term jocular describes intent of the speaker; a person hearing the term/sense may not even be aware of the humor involved. The term humorous implies a reaction on the part of the person hearing/reading, and not necessarily in accordance with the intent of the writer/speaker. I find many things humorous that were not at all jocular. A student who asks for the answer to a question, immediately after that answer was given aloud to the class, may have his question considered humorous, but the question was not jocular. Or consider, I find the terms (deprecated template usage) irregardless and (deprecated template usage) nosegay humorous, but neither is necessarily jocular. --EncycloPetey 20:19, 16 January 2009 (UTC)Reply
My point is that nobody would expect "humorous" to mean "will be funny regardless of what was meant" (which would almost certainly the case of most "jocular" words anyway, cf. (deprecated template usage) wuv). Circeus 05:40, 17 January 2009 (UTC)Reply
But you do understand my point that "humorous" does not say anything about the intent of the speaker/writer. Things can be humorous solely on the part of the hearer/listener, and what we're trying to communicate with the context tag is something of the intent in using the word. --EncycloPetey 20:33, 17 January 2009 (UTC)Reply
I agree strongly with this and think you stated it well! Imagine if we put sarcastic on every word that is liable to be used sarcastically. The whole point of humour and sarcasm is that they are deliberately "twisted" or unexpected usages requiring some thought by the listener/reader; they are creative rather than dictionary-prescribed. Equinox 20:37, 17 January 2009 (UTC)Reply
I guess this varies from person to person. I normally think of "humorous" as having only the objective sense, so that overrides what I would otherwise expect in a context label. This gives me an idea, though -- could we just change the label on {{humorous}} to say used humorously? That might still be a little obscure for folks like me, but I think we could figure it out. -- Visviva 02:19, 17 January 2009 (UTC)Reply
That might work. I'd like to hear opinions from a few more people though, in case there's an issue I'm not aware of. --EncycloPetey 03:29, 20 January 2009 (UTC)Reply

as soon as possible

I would like to create the entry "as soon as possible", which is now a redirect, but I sense it is disputable as being sum of parts. Even if it is sum of parts, the entry seems valuable to me. It strikes me as a set phrase—a common expression whose wording is not subject to variation; the property of this being a set phrase is witnessed by the existence of the internet initialism ASAP. Unfortunately, WT:CFI does not have a provision allowing for set phrases. The phrase "as far as one knows" seem to come into the same bucket of set phrases that are sum of parts.

When I, a non-native, read the set phrase in a given text, I can understand it. But when at earlier times I went in the other direction, from the meaning to the phrase, I knew how to say this idiomatically or in a standard way only because English textbooks explicitly documented the phrase. That is, without the documentation or before I have learned the phrase actively enough, I would have ended end up saying things like "Please call me back at the earliest time you can" or "Please call me back as soon as you can".

What do you think of me creating the entry, then? Does its creation require a modification of WT:CFI? Is there a discussion of the topic of SoP set phrases that I have missed? --Dan Polansky 09:57, 17 January 2009 (UTC)Reply

I agree that (deprecated template usage) as soon as possible is a set phrase and warrants an entry, though I don't agree with all of your comments about it. (Acronyms don't always indicate a pre-existing set phrase; and "as soon as you can" is perfectly ordinary English, as is "at your earliest convenience".) —RuakhTALK 17:06, 17 January 2009 (UTC)Reply
It seems like there's a good case for it as a "Phrasebook" entry; many other languages also have formulaic ways of expressing the same basic idea. -- Visviva 17:10, 17 January 2009 (UTC)Reply
And of course there's "Quick as you like!" - meaning "Now!". Pingku 17:53, 17 January 2009 (UTC)Reply
All right, created. I agree that the existence of an acronyms only suggests but does prove that a phase is a set phrase. I have noticed there is the {{set phrase}} template, and used it in the entry. --Dan Polansky 06:51, 18 January 2009 (UTC)Reply

Multiple Alternative Spellings

If a word has multiple alternative spellings, should the words be placed in one row (e.g. sadhe) or listed vertically (e.g. kris)? I believe there is a sentiment that space above the definition is considered "prime real estate", so one row is better? --AZard 16:28, 17 January 2009 (UTC)Reply

I'm an advocate of above-the-fold screen-space conservation. There is less justification, IMHO, for taking up lots of space for Alternative spellings (and forms) than for Pronunciation and Etymology. I'd favor some space-reduction approach for both of them, too, but there is definitely a lack of consensus on how and possibly opposition to the very idea of it. For Alternative forms and spellings, there seems much less opposition. DCDuring TALK 16:43, 17 January 2009 (UTC)Reply
Personally I list them vertically, unless some spellings are groupable for whatever reason (e.g., they differ only in capitalization or hyphenation or the like). Consistent horizontal listing seems weird to me, because the bullet-point suggests a vertical list. —RuakhTALK 17:01, 17 January 2009 (UTC)Reply
In the absence of specific policy guidance, I have been listing them vertically. There are cases where a form is followed by a parenthetical qualifier such as (UK), and this seems more likely to be visible if the items are arranged vertically rather than horizontally. However, despite my being an opponent of over-compacting the Etymology and Pronunciation sections, I can see real merit in having a way to collapse the Alternative spellings/forms section when there are more than a couple of items listed. Earlier periods of English employed myriad different spellings of some words, and in many cases these other spellings are now obsolete. I somewhat favor the idea of a collapsible box for this section (as is done for Related terms), to be used when more than X terms are present, and think X=2 or 3 would be a sensible cutoff. --EncycloPetey 20:42, 17 January 2009 (UTC)Reply
Bullets are the de facto standard, although as Ruakh says, if a group share a certain characteristic (e.g. are all archaic), they could probably be put on one line.
IMO we should reconsider the placement of ===Alternative spellings===; there are powerful structural reasons for the placement of ===Etymology=== and ===Pronunciation===, but those reasons don't seem to apply to alternative spellings. The default placement often results in ontologically incorrect arrangements, where a spelling that actually pertains only to one sense, POS, or ety is placed so as to appear to apply to the whole language section. -- Visviva 04:45, 18 January 2009 (UTC)Reply
It is permitted, in situations where the spellings do not apply to all parts of speech, to place the section at L4 under tonly the relevant POS. It is permitted because we do it, and there is no guiding policy at all. The ELE makes no explicit recommendations about this section at all. Its preferred location must be inferred from the example, which is the only place in that document that it is mentioned. Even Wiktionary:Alternative spellings does not mention that this is a possible section for a page, but rather discusses only the items that may be regarded as alternative spellings. --EncycloPetey 05:07, 18 January 2009 (UTC)Reply

I agree with EncycloPetey that once the number of alternative spellings exceeds two or three, we should house them in a rel-table. However, for both functional and æsthetic reasons, they ought to be listed verically, not horizontally. See slave#Alternative forms and enmity#Alternative forms, in which cases no other præsentation is practical.  (u):Raifʻhār (t):Doremítzwr﴿ 19:02, 18 January 2009 (UTC)Reply

The use of {{rel-top}} conflicts with the floating right table of contents under my circumstances. The difficulty is that, if the window is narrow enough all content is pushed below the right-hand-side table of contents. That would seem to argue against any features that did not have nice word wrap and any fixed-width tables or similar features. To see the problem (if you use right-hand toc, go to slave and narrow your window. Also expand the show/hide, even if you do not get the white space. DCDuring TALK 16:32, 19 January 2009 (UTC)Reply
I don’t use a right-hand ToC, and don’t know how to use one, so I don’t understand the problem.  (u):Raifʻhār (t):Doremítzwr﴿ 16:54, 19 January 2009 (UTC)Reply
What is "floating right table of contents"? --AZard 16:50, 19 January 2009 (UTC)Reply
Most users will see a table of contents in the upper left of a page that is long enough to generate one. The TOC pushes all text on the page to begin after the TOC. Some users dislike this default setup, and have arranged for a customization that puts the TOC in the upper right corner of the page and has the text begin in the upper left, alongside the TOC instead of below it. This is what we mean by a "floating right table of contents". DCDuring is pointing out that having collapsible tables near the outset of the page interferes with this customization, which is why some (like myself) have never opted to use the right-floating TOC. It can interact badly with images, collapsible tables and other items. Nevertheless, some people prefer this customization and have grown attached to it. --EncycloPetey 17:17, 19 January 2009 (UTC)Reply
By default, the table of contents floats to the left, but it can be changed to float to the right, freeing up a great deal of space. See WT:PREFS. Hopefully this or a no-TOC view will become standard in the future. -- Visviva 17:11, 19 January 2009 (UTC)Reply
I agree with EP that a vertical listing is preferable, as this allows for notes on them (in my experience, alt spellings are very rarely arbitrary). I also agree that a collapsible box is a good idea. I sort of agree with Visviva on reconsidering the placement, but with some caveats. Under our current nesting scheme, if an alt spelling applies t o multiple POS's, an L4 alt spelling header is a poor choice, as it would force the duplication of information. A below the fold L3 might be acceptable, but I almost feel as though it would be a bit nonstandard (although I will admit that when an inflection applies to multiple POS's or etymologies, I have been placing trailing L3 inflection lines, in lieu of repeating the information). However, when an alt spelling applies to only on POS (including when there is only one POS), I would be ok with an L4 header being the standard. -Atelaes λάλει ἐμοί 02:45, 19 January 2009 (UTC)Reply
Summary: There seems to be consensus that a vertical list of more than 2 or 3 alternative spellings is an inefficient use of above-the-fold space. There seems to be two potential solutions: 1) collapsible box - vertical list (at the sacrifice of right-handed TOC) or 2) below-the-fold (Level 3 header for most entries. I found an existing example: tire-pressure. Level 4 for specific POC or etymology.) Solution #2 affects all entries with alternative spellings, not just those entries with multiple alternative spellings (and would require a vote to modify the ELE). Did I miss anything important? --AZard 21:38, 19 January 2009 (UTC)Reply
If a user is already looking at the entry for the primary spelling of a word, why would a user care about alternative spellings (or variant spellings)? If we feel users place a high value on alternative spellings, then the collapsible box makes more sense. If it's more like trivia, then below-the-fold makes sense (L3, maybe right above Anagrams? L4, maybe right above Synonyms?) I've looked at a few dozen definitions with alternative spellings; I'm leaning towards trivia. What is your opinion? Valuable or trivia? Any other ideas on helping us pick one over the other? --AZard 21:38, 19 January 2009 (UTC)Reply
According the NielsenNorman Group ("NNG") there is an argument that, from perspective of a user who has come to a main entry from an alternative spelling, it is useful to have something on the page that corresponds to what the user had possibly entered, like the small-font line for redirects. This argues against concealing alternative spellings under show/hide bars or having them below the fold. OTOH, most users know about the "Back" button, as NNG also points out. Our users are probably mostly browser-smart, so I wouldn't mind seeing the alternative spellings above just above Anagrams. DCDuring TALK 22:52, 19 January 2009 (UTC)Reply
In English, the alternatives are usually (1) archaic spellings, (2) slight variations, or (3) regional differences. I don't think anyone is terribly concerned about how placement would affect the first two situations, so let's ask ourselves what it would mean if color/colour had the alternative forms explanation moved to the bottom of the page. I don't have an answer to that; I'm suggesting a narrower focus on a situation where the potential for real impact exists.
DCD, your suggestion of "just above anagrams" assumes that the alternative spellings apply equally to all parts of speech. Whatever we decide must provide for situations where the alternatives apply to just one part of speech or to many parts of speech. If we lump them all in a single end section, that presents problems in cases where the alternatives apply only to senses under a single etymology or single part of speech.
We also need to consider what this would do to CJKV languages, where (as I undersatnd it) the alternative forms can be very, very important to the user. I'd like to hear from people who work in those languages before we carry a particular line of thinking too far. --EncycloPetey 03:26, 20 January 2009 (UTC)Reply
FWIW, I care about the first situation as well as, probably, the second situation.  (u):Raifʻhār (t):Doremítzwr﴿ 18:27, 22 January 2009 (UTC)Reply
The suggestion of Azard which you attribute to me makes no assumptions whatsoever. OTOH, I made some. I would like to reanalyse. The Wiktionary-usage scenarios that I believe ought to be foremost in our mind are of occasional mostly non-contributing users (mostly unregistered), either English-language-mostly or English-language-learners. I assume that a user comes to a given entry seeking the meaning of something read or heard.
If coming from something read, then the user has particular spelling in mind. When the user enters the spelling and comes to an alternative spelling entry, the user's search might be over upon seeing the main word for which the spelling was the alternative. (It would be useful that the alternative spelling entry itself contained any context information about its use, notwithstanding the duplication-of-information problem. Bot updates or transclusion would be nice for this purpose.) Otherwise, the user has the main entry itself as a resource, including the "also" line at the very top. If the user knows the word under one of its alternative spellings we save the user time by having the alternative spellings on top. That would suggest that all and only spellings currently used by a significant(?) fraction of ordinary users need be listed at the top of the page. It would also suggest that they not be concealed under a show/hide bar. Space conservation would argue for horizontal arrangement. In this scenario, some users would probably be saved a modest amount of time from having context information adjoining the spellings. Because most context information is brief or even terse (eg, US, UK, Australia), I would argue that even three alternatives would often fit on one line.
If coming from something heard, then there is an intermediate step possible if a user's naive phonetic or semi-phonetic spelling does not correspond to a lemma or one of the alternative spellings. There is also potentially the confirmatory value of the Pronunciation section material the user finds accessible.
What this leads me to is the desirability of keeping all the useful confirmatory material (the top "see" line, current alternative spellings, pronunciation, and toc (the toc possibly abbreviated)) visible at the top by default. -- IOW, our current position! -- (Whether the assumption of rapid and broad accessibility of the Pronunciation information is warranted seems beyond the reach of facts at our disposal.)
Historical information, such as older alternative spellings, does not seem to be justified in taking large amounts of space by this argument. Information of principally scholarly interest should certainly be included, but not take up above-the-fold space by default. Whether a space-conserving show/hide bar for older alternative spellings should appear beneath the alternative spellings header, under Etymology, or at the bottom (because of incompatibility with right-hand table of contents) I don't know. DCDuring TALK 18:22, 22 January 2009 (UTC)Reply
IMO, the best solution would be to have any list of alternative spellings exceeding two in number tucked away in a rel-table and the ToC collapsed by default for unregistered users.  (u):Raifʻhār (t):Doremítzwr﴿ 19:40, 22 January 2009 (UTC)Reply
Why? DCDuring TALK 23:17, 22 January 2009 (UTC)Reply
Certainly don't hide them if they can be fit on one or two lines. Seems a shame to grab a heading above the fold, and then give the reader nothing, when a dozen or more terms separated by commas could be offered in the same space. Collapsible boxes on the page should be a last resort, especially above the article core. Michael Z. 2009-01-23 00:17 z
I don't understand how alternative spellings are confirmatory. Because we don't use redirects, someone searching for an alternative spelling will get their confirmation on the alternative-spelling page. If they click on the link for the main entry, they will presumably be expecting information on that spelling, not the one they entered originally. -- Visviva 00:13, 23 January 2009 (UTC)Reply

Certainly exhaustive lists of historical forms don't need to be visible by default. These seem to me to belong with in the “Etymology” section, rather than “Alternate forms” which comes first for practical reasons. Even if some historical forms aren't ancestors of the modern form, they belong in the etymology for comparison.

We have missed the simplest of all alternatives: plain lists without bullets. This may not always be the most appropriate form, but even complex lists with groups or comments can be listed in a sentence or paragraph. Some of the above-linked examples can be remade as below. (What ae α-form and β-form?) Michael Z. 2009-01-22 22:21 z

Alternate forms

sade, sadi, tsade, tsadi

Alternate forms

crease, creese, keris

Alternate forms

[skl]-initial α-forms of the 14th century include sclaue; 15th c. sclaue, sclave; 16th c. sclaue, sklaw, sklaue, sklave. [sl]-initial β-forms of the 16th c. include slaif, slaue, and the modern form slave, 17th c. slaue and slave, whenceforth the modern spelling predominated.[1]

  1. ^ slave, n.1 (and a.)” listed in the Oxford English Dictionary [2nd Ed.; 1989]

Category:Filmology

This is a rather subjective thing, but I hate the word ‘filmology’. Apart from anything else, it's not in any dictionary I own. At the least, it is a specifically American term and I wonder if there isn't anything more inclusive. What about just ‘Film’? The other problem I have (because I work in television) is that a lot of these terms pertain to TV as well, but I can't think of a Category name that incorporates all that. Any thoughts? Ƿidsiþ 20:22, 17 January 2009 (UTC)Reply

Though perhaps too broad, there's always "Visual media". That incorporates both film and TV without adding in too much other stuff. —Leftmostcat 20:27, 17 January 2009 (UTC)Reply
I had the same initial reaction when the category first appeared, but have to come to accept it as a necessary evil. If there is another appropriate term, I don't know what it is. "Visual media" is too broad, since it includes several additional artforms. "Film" is more restrictive and is ambiguous since that word has a physical properties sense of a "membrane". "Cinematography" is restricted to only certain aspects of filmmaking. So, the best I could suggest is "Filmmaking", but I'm not sure that term is quite right either. --EncycloPetey 20:30, 17 January 2009 (UTC)Reply
I would go with Category:Film myself, or Category:Cinema if that is too ambiguous. I don't think it is ambiguous, though; IMO any science/tech category would have to be at "Film processing" or similar, not just "Film". "Filmology," which I understand as the academic study of film, might be appropriate for words like auteuristic, but not for words like gaffer. (I think our current definition of filmology is substantially incorrect, see e.g. [1].) If neither "film" nor "cinema" are acceptable, as a last resort we could perhaps use "motion picture industry"...-- Visviva 04:36, 18 January 2009 (UTC)Reply
I could go with Cinema. There's some ambiguity, but the translations in most major languages that are cognates to cinema refer to the industry of making movies. I'd prefer it over Film, whose cognates in most languages would refer only to a particular print of a flim or the physical medium on which it was produced. --EncycloPetey 04:43, 18 January 2009 (UTC)Reply
Thanks to the inscrutable magic of {{context}}, {{film|and|TV}} renders correctly: Template:film. May not be the most elegant solution, but it gets the message across, and looks a little better than plain {{film|TV}}. -- Visviva 04:40, 18 January 2009 (UTC)Reply

I'm exploring wiktionary. I found significant error on Wiktionary logo. "a multilingual free encyclopedia". Wiktionary is not encyclopedia but dictionary. Who can change that? Best regards.--Kwj2772 07:17, 19 January 2009 (UTC)Reply

That is not an error. Notice the text is in llight gray because it is the definition for the preceding term in the list (not visible). The definition of Wiktionary is in black. --EncycloPetey 07:21, 19 January 2009 (UTC)Reply
This is the first time that I realize that that logo is supposed to depict a boldened list entry... __meco 13:57, 19 January 2009 (UTC)Reply

I still think we should find a logo that matches the logos of the other projects by being shades of blue and roundish. There were a bunch of good ones proposed using speech bubbles, yet we continue to use the boring text-only version because of some protest of how the vote was organized. --Arctic.gnome 18:43, 23 January 2009 (UTC)Reply

Linking of language name in translation section

Why should some language names be linked in the translations section[2]? __meco 13:28, 19 January 2009 (UTC)Reply

See WT:TOP40; the most common and familiar language names, including the ones clearly associated with a country, are not linked, and the others are. (There used to be exactly 40 in the first table.) But not something anyone needs to think about too much, as AF will make the adjustments. Robert Ullmann 14:12, 19 January 2009 (UTC)Reply

Category for IPA entries

Could we have a hidden category displaying all word entries that have an IPA entry, for each language? That would assist me immensely in getting the hang of the IPA thing and surely lead to a lot more word entries getting an IPA entry. __meco 14:02, 19 January 2009 (UTC)Reply

I don't understand. How would that help? Each language uses a different collection of symbols, and a language like English does not follow predictable patterns in pronunciation as related to its spelling. What language(s) are you seeking to be able to code IPA for? --EncycloPetey 17:10, 19 January 2009 (UTC)Reply
You can use CatScan to search for entries in your desired language's category (with some depth to allow for different parts of speech) by use of the template IPA.—msh210 17:13, 19 January 2009 (UTC)Reply

Links to Indices

User:Panda10 raised a good point on my talk page. It would be nice to link entries back to the language indices, where such exist. It would also be nice to support "previous alphabetical" and "next alphabetical" links on language page entries - though we should treat these ideas as seperate for now. Are other people of this opinion? Conrad.Irwin 23:38, 19 January 2009 (UTC)Reply

Yes and no. I agree that a link to the index is a superb idea, and I agree that a link to previous or next is a superb idea iff there's no link to the whole index; and otherwise it's a good idea anyway. I do not, however, agree that the ideas should be kept separate for now.—msh210 21:11, 20 January 2009 (UTC)Reply

In terms of implementation, it would probably be best to include a small link under the language name (much as the prominent interwiki link preference does). This could be added by javascript, or by inserting a template onto each page under the correct language heading. Other possibilities include adding it in a ===See also=== section, and, though I hate them, a right floating box. Conrad.Irwin 23:38, 19 January 2009 (UTC)Reply

A tiny link right under the language heading sounds good to me.—msh210 21:11, 20 January 2009 (UTC)Reply
If we choose to implement this, then I agree with msh210. --EncycloPetey 22:12, 20 January 2009 (UTC)Reply
Does {{seeindex}} meet with general approval? It would take two edits per new lemma form creation to maintain all these links, which is not a huge number, but also not an insignificant number. If we were to link only to the Index, it would only need adding to each page once, so only reduces the workload by a third. It could also be done in javascript, but that is a less "nice" solution. Conrad.Irwin 01:08, 21 January 2009 (UTC)Reply
I think {{seeindex}}, with the links to previous and next entries, would be fantastic in terms of helping us and our users see the lay of the land. The mutual invisibility of entries has been a problem for a while. On the other hand I am troubled by the thought of basically tripling* the number of revisions entering the database, and the number of write operations. I don't have the skills or knowledge to evaluate how many resources this would actually consume, but it seems like the cost over time (in server or other resources) could be non-trivial, particularly as the rate of new-entry creation continues -- it is hoped -- to increase. -- Visviva 02:32, 21 January 2009 (UTC) *OK, given the form-ofs and whatnot, not actually tripling, but multiplying by some non-trivial factor.Reply
I think you are overestimating, we only need to edit on "creation of lemmas". Much less than an interwiki bot which must edit on the creation of any page on any wiki. Conrad.Irwin 14:09, 21 January 2009 (UTC)Reply
I really like the way the links generated by {{seeindex}} look. How will this template handle the red links that have just been introduced in the index? Will it always point to the next blue link? Can you explain the maintenance? What are the two edits for each new lemma? --Panda10 03:14, 21 January 2009 (UTC)Reply
This would be another discussion entirely, but I don't think it's a great idea to put redlinks in Index: pages. For one thing it violates the principle of least astonishment: indices normally list the content actually in a book, not the content that should be in the book. :-) IMO it would better to use Wiktionary:Requested entries or -- for particularly important groups of words -- a specialized hotlist. -- Visviva 04:42, 21 January 2009 (UTC)Reply
The index was relatively invisible to the general user community and was mainly used for maintenance. So in this respect, the red links were very helpful to me. If we decide to bring the index forward and make it visible, then yes, it would look better with blue links only. As you said, Visviva, a separate category would work fine listing the red links with a pointer back to the page where they are mentioned. --Panda10 13:03, 21 January 2009 (UTC)Reply
The words in the index are still all in Wiktionary (with the exception of the italic words in Index:Ancient Greek which all come from a list that was originally stored in that index at the request of the user who asked for that index) it's just that they don't all have their own entry pages. It doesn't bother me particularly whether people want these words listed in the indices, or whether they'd prefer I generated some Requests pages from them. The forward and backward links would have to link to the Translations section of the entry that includes the word; if they were going to include the red-links. This might be a bit confusing - so it's probably better if they ignore the red-links. Conrad.Irwin 14:09, 21 January 2009 (UTC)Reply
Are the indices being automatically updated now? Or would that be part of this change? -- Visviva 02:32, 21 January 2009 (UTC)Reply
I run scripts to update some indices (those that people have asked me for) as and when, I currently think about every fortnight or so is often enough. See User:Conrad.Bot for more gory details. The process evolved more than I programmed it, and that is really beginning to show; but until I find something that I can't just "hack on" to it, I'll leave it as it is. Conrad.Irwin 14:09, 21 January 2009 (UTC)Reply

Surely there's an easier way to link to indices without individually editing each page? Maybe the links could be added to other common templates that are already on the page. Nadando 02:38, 21 January 2009 (UTC)Reply

There is a possible template-based non-JS solution... If the index (or perhaps a list generated from the latest dump) were chunked into sizes small enough that the parser wouldn't choke on them -- maybe 50-100 words per chunk -- a metatemplate could be generated for each chunk which would contain the code for each word. So the code for {{metaindex-hungarian-absz}} or whatever would be something like:
{{#switch:{{PAGENAME}}|.....|absztraktság={{seeindex2|lang=hu|letter=a|1=absztrakt|2=abszurdum}}|....}}
These metatemplates could then be updated by bot on a daily/weekly/whateverly basis, without needing to touch the individual entries. (Of course, periodically the list would need to be rechunked, but if we're clever enough about the original setup that could probably be done with minimal pain.)
There may be reasons why this isn't a good idea, but I thought I'd throw it out there. -- Visviva 04:36, 21 January 2009 (UTC)Reply
It's an interesting idea certainly, but I'm not sure it's actually any nicer than editing each page. Every edit to the "hundred-word" page would cause all hundred pages that include it to be re-parsed; and as you say, if re-chunking is necessary, then we still have the same problem. I suppose the "nicest" solution would be to have actual MediaWiki support for these indices and the links thereto; but I can't see that happening for several months even if we had a specification and the time to code it up. Conrad.Irwin 14:09, 21 January 2009 (UTC)Reply
Is there a way to create a template that doesn't contain the prev/next word, but dynamically would figure out what the prev/next is when the entry page is displayed? --Panda10 13:03, 21 January 2009 (UTC)Reply
My other thought was to put the list into onto the toolserver, and use Javascript to do a lookup; this has the advantage that the page doesn't ever need to be edited, but the disadvantage that it won't make "real" links; maybe a half and half solution is better; whereby the pages contain a link to the index, and then javascript can be used to generate forward and back links? Though, as I said before, editing two pages on the creation of one lemma-form entry is still very doable, particularly when you compare it to the much larger task done by the interwiki bots editing all pages on all Wiktionaries whenever any page is created. Conrad.Irwin 14:09, 21 January 2009 (UTC)Reply
I like JavaScript best, on consideration (and I can see that my idea above would not really have improved things). The lack of edits is an important factor -- after all, just setting up a template-based system would require hundreds of thousands of edits, to say nothing of ongoing updates. But in addition, JS would allow all sorts of arbitrary customizations. If there are situations where it would be nice to, say, allow entry-to-entry browsing through all of the verbs in a language, or if there are people who would like to browse through a reverse index, etc., it would just be a matter of posting the requisite files and flipping a setting (either in the default skin or in PREFs/user JS). Could we host the JS-index files locally, or is toolserver better for this sort of thing? -- Visviva 03:45, 24 January 2009 (UTC)Reply

The following is copied from user talk:msh210:

This is a relational operator, and is more like a verb than a preposition. We usually mark these as "Symbol" rather than trying to fit them to a part of speech. --EncycloPetey 23:20, 19 January 2009 (UTC)Reply

It is both a verb and a preposition, and there are loads of examples for each. Why would one mark it "symbol" if it fits a POS perfectly?—msh210 23:21, 19 January 2009 (UTC)Reply
Please provide an example of prepositional usage. I can't think of one, which means it doesn't fit a POS perfectly. --EncycloPetey 23:23, 19 January 2009 (UTC)Reply
Added.—msh210 23:25, 19 January 2009 (UTC)Reply
That's not a preposition, that's [negating adverb] + [comparative adjective] + [preposition]. --EncycloPetey 23:26, 19 January 2009 (UTC)Reply
Huh? In English we'd read it as a phrase, yes, but if there were a word for it in English (as there may be in some languages) it would be a preposition. I mean, suppose "notlessthan" were a word. What POS would it have in "for all 'x' notlessthan 3"? Preposition, of course, like "over" in "for all 'x' over 3". So that's what is then.—msh210 23:31, 19 January 2009 (UTC)Reply
As part of a phrase, yes, not as a part of speech. Since this is a Translingual entry, it would have to translate as a preposition into every language where it is used, but it doesn't even translate that way in English. Would you call "not older than", "not wider than", "not later than", etc. prepositions? No. They're all examples of a particular phrasal construction in English, and not one of those would merit an entry. They're all sum of parts, combining several parts of speech. We don't get to invent hypothetical words to justify a part of speech label. --EncycloPetey 23:36, 19 January 2009 (UTC)Reply
I think the tradition here is to mark things as adverbs if they act as adverbs, as nouns if they act as nouns, as prepositions if they act as prepositions. Thus, for example, carry the message to Garcia is called a verb even though it is a long thing with a noun, a proper noun, and a verb (inter alia) in it; over the top is called an adjective even though it is actually a prepositional phrase (preposition plus the complement (or whatever it's called) of that preposition); and next to is called a preposition even though it's an adverb+preposition. Same here: is used a preposition, so it's a preposition, no matter how it splits up. (And note that it doesn't even split up!! Only its English gloss does. I'm saying that even if it would, it's still be considered a preposition, and a fortiori in the case at hand.)—msh210 23:41, 19 January 2009 (UTC)`Reply
That is indeed the tradition for English phrases, but not for any symbolic Translingual entries. Also consider: Abbreviations, Contractions, Initialisms, etc. are categorized as such, and not as any particular POS. And you haven't demonstrated a function as a preposition; I still disagreee on that point. Yuo certainly haven't demonstrated it for German, Dutch, French, Italian, Japanese, and all other languages that make use of this symbol. --EncycloPetey 23:44, 19 January 2009 (UTC)Reply
Would you like to have some beer with me and with a few of our mutual friends?—msh210 23:49, 19 January 2009 (UTC)Reply
Additional opinions and suds could clarify the matter. --EncycloPetey 23:52, 19 January 2009 (UTC)Reply

The preceding is copied from user talk:msh210. Please continue discussion here.

On whether to classify the ≮ symbol as a symbol or to assign it a part of speech: As a symbol. Reasoning: The symbol "+" is now classified as a symbol, although it could be classified as a preposition, following the example of plus. A similar consideration holds for other symbols in Category:Translingual symbols, including ⇐, ⇒, ⇔, ∀, and ½ — a meaningful part of speech can be determined for their rendering in words, but they are still classified as symbols. --Dan Polansky 08:46, 20 January 2009 (UTC)Reply
I view such entries as stubs waiting to be fixed: the symbol header is fine if nothing else fits, but should be replaced if something does.—msh210 17:15, 20 January 2009 (UTC)Reply
The current common practice at Wiktionary is to use the symbol header, as witnessed by the entries at Category:Translingual symbols. What you are proposing is a change in the common practice.
On the topic of whether the current practice should be changed, I do not see how changing the heading from one "Symbol" heading to several PoS headings is going to help anything. It is unclear to me how to determine the SoP (AKA lexical category) of such phrases as "greater than" or "sooner than"; neither strikes me as a prepositional phrase. It seems to me that the SoP of such phrases might as well be undefined. I am no export on the determination of SoP, but it seems not every subsequence of words in a sentence can be assigned a SoP; the string "Socrates is" comes to mind. --Dan Polansky 19:04, 20 January 2009 (UTC)Reply
You mean POS ("part of speech", i.e. lexical category), not SoP ("sum of parts", i.e. not idiomatic). —RuakhTALK 19:28, 20 January 2009 (UTC)Reply
Oops, there I go; right. And, quoting myself, "I am not export". Sigh. --Dan Polansky 21:02, 20 January 2009 (UTC)Reply
I agree that "symbol" is better, since it seems to be a unit of meaning rather than of syntax. In English it can be a verb or a preposition; it can also be a noun (as in "≮ and < are complementary relations", which resembles mention, but does seem to be use when you really think about it: it's talking about the relation, not the symbol denoting it), and possibly other things as well. (And I'm not sure the preposition-like use is really a preposition; it seems more like an adjective that takes a directly construed complement. This is rare in English — most of our complement-taking adjectives use a preposition, like "full of ___", "concerned with ___", etc. — but there are examples, such as (deprecated template usage) worth, that seem apposite. And since ≮ is also a verb, we could actually view its preposition-like uses as a participle, "not being less than ____".) And of course, other languages may give it completely different POSes. (BTW, this is neither here nor there, but I think (deprecated template usage) next to actually is a preposition, at least in some cases; when we P-strand, we say "the statue next to which I was sitting", not *"the statue to which I was standing next".) —RuakhTALK 13:36, 20 January 2009 (UTC)Reply
Not sure I follow. How is this different from set, which also is a noun, verb, etc., and which also has the characteristic that other languages give a different POS to some senses than we do (e.g., one adjective sense has ingesteld listed as a translation, but we list that as a verb)?—msh210 17:15, 20 January 2009 (UTC)Reply
One way it is different from set in that set is an English word used in English in those ways. To treat a Translingual symbol entry similarly would be a nightmare, as we would have to justify any part of speech for each language in which the symbol is used. Sometimes (as you note) the translation of a word into another language results in a change in the part of speech. For example, although English considers language names to be proper nouns, Slovene regards them strictly as adjectives. In English, we use a noun for the concept of a year, but in Navajo they use a verb. An idea does not always have a universal part of speech assigned to it; the POS depends upon the norms of the particular language in which the idea is expressed. So, cannot be assigned a part of speech solely based on an analysis of its translation into English. --EncycloPetey 18:52, 20 January 2009 (UTC)Reply
What EP said. —RuakhTALK 19:28, 20 January 2009 (UTC)Reply
By the same logic (i.e., that it might have a different POS in some language), every ==Translingual== word should have no POS listed. (===Cardinal number=== might be an exception, since it's not really a POS.) What do we do, then, with all the taxonomic names we currently list as translingual proper nouns (or, sometimes, nouns), e.g., Lemmus lemmus?—msh210 20:21, 20 January 2009 (UTC)Reply
Taxonomic names have internationally accepted guiding documents that stipulate things like: "The name of a genus is a noun in the nominative singular, or a word treated as such, and is written with an initial capital letter." (ICBN 20.1, 2006 ed.) The orthography and part of speech have been set by international agreement. --EncycloPetey 20:51, 20 January 2009 (UTC)Reply
So lemme get this straight (and this is a question for y'all, of course, not just EP). Any ==Translingual== entry except taxonomic names and ===Cardinal number===s is automatically a ===Symbol===?—msh210 21:04, 20 January 2009 (UTC)Reply
I don't see that anyone has argued for that. We have Letters and Abbreviations, for example, and a very few Translingual phrases and adverbs (all from Latin, and I think the POS-neutral "Phrase" is superior), but not much else that isn't a Symbol, numeric symbol, or taxonomic name. There are many items that are indeed given as "Symbol", such as chemical formulae (H2SO3), mathematical symbols, and paragraph and section markings, among others. I classify as "Symbol" those items that are not expressed with letters (or their equivalent) and are not an attempt to write a word in any traditional sense (although it might be possible to express the same idea with one or more words). Items that are Symbols typically have a written form that is abstract, geometrical, or... symbolic. It's a very fuzzy concept, the (deprecated template usage) symbol, but in Indo-European languages it's a bit easier to distinguish a symbol from other categories of items, because our writing uses either letters or an abugida alphabet. Our definition of (deprecated template usage) symbol seems to reflect that viewpoint. --EncycloPetey 21:21, 20 January 2009 (UTC)Reply
Re: "By the same logic [] , every ==Translingual== word should have no POS listed.": Well, one small difference is that even in English, it's not obvious what POSes something like ≮ has — it can be used in a variety of POSes, and will be read differently to fit its use in a sentence. (This occurs even within a POS: it may be "is not less than" or "are not less than", for example.) But yes, that's a good point. It might be worth considering what kind of words and such are genuinely translingual, and coming up with appropriate parts of speech, such as "Taxonomic name" or just "Name". —RuakhTALK 21:22, 20 January 2009 (UTC)Reply
Note: name is synonymous with (deprecated template usage) proper noun when discussing a particular thing or entity. --EncycloPetey 21:28, 20 January 2009 (UTC)Reply
Not necessarily. In English the names for particular things or entities are often proper nouns, and often non-count nouns. A proper noun in one language is not necessarily a proper noun in another (as in your Slovenian example above; those are still the names of languages, but grammatically they function as adjectives, right?). —RuakhTALK 21:49, 20 January 2009 (UTC)Reply
Granted, but then they aren't names in Slovene, are they? They're descriptions. Slovene refers to languages by describing them, rather than naming them. If a word is truly the name of a particular thing, then it is a proper noun. So introducing a new POS header of "Name" is superfluous. And in the case of taxonomic names, the part of speech is specified (as noun) in the international documents that govern their formation, acceptance, and use. --EncycloPetey 22:09, 20 January 2009 (UTC)Reply

enPR

Two years ago, we had this discussion about renaming the "AHD" pronunciation scheme, which led to a series of votes, culminating in this vote where we renamed it to "enPR". The main impeti for this, as far as I can discern, were that:

  • This is our own system, not the same as the one used by the AHD.
  • "AHD" is the name of another dictionary, so it seems weird to name our pronunciation system after them.

Now a Wikipedian, who shall remain nameless, has commented at Wiktionary talk:English Phonemic Representation and raised the possibility that this is in fact the same system as used by AHD. (He's also accused us of plagiarism, and purports to have notified the publishers of the AHD.) As far as I can tell, he's mistaken — I do see some differences, such as the second vowel of (deprecated template usage) city (which we give with "i" and the AHD gives with "ē") — but the differences seem to be minor. By and large, our system seems to be very close to the AHD's. I think a lot of the similarities are standard — for example, there's a long tradition of ē for /iː/ — but I'm not sure if all are.

Is this something we should look into? If the system really is the same as the AHD's, should we consider either changing it, or returning to the name "AHD"?

RuakhTALK 03:03, 20 January 2009 (UTC)Reply

We should update the table, as Hippietrail has begun doing. Most of the enPR symbols in the table were added last summer by two users, neither of whom had been with Wiktionary for very long at the time (one appears to have edited the chart among his first edits on Wiktionary, and the other had been here six months at the time). The update will require that people who use enPR notation insert the symbols they've been using, or that someone use a bot to extract a symbol list from all the calls to {{enPR}}. Additionally, there is some discussion on the table's talk page about problems with symbols listed in the table. Two of these issues have sat unresolved for some time, and a third one was started recently. There is a chart on Wikipedia comparing several phonemic systems that can be used to judge what other dictionaries have used, and how much difference exists between such publications. --EncycloPetey 03:13, 20 January 2009 (UTC)Reply
It begs the question, why do we use it at all when we have IPA? (I can read neither, so it doesn't really bother me). If AHD assert their right to the alphabet they use, then we can't use it under any name - as forcing us to always display their name with it is not compatible with the GFDL. I strongly suspect that they won't, in which case it matters little what we call it, though maybe we should change the wording from "designed to be similar to" to "strongly based on" to reflect the only slight deviation. If the purpose of enPR was to be understandable to those who already know the system used by the AHD, should we not remove the differences that do exist, lest we cause more confusion? Conrad.Irwin 19:43, 20 January 2009 (UTC)Reply
Our system is not identical to AHD's. Our system differs from that of the AHD at least as much as their system differs from those of other major dictionaries. There are a number of significant differences, and a few more potential differences have been under consideration for some time. We allow IPA, enPR, and SAMPA for Wiktionary entries, each for their own reasons. The purpose of enPR was to make pronunciations more accessible to Americans, where IPA is still a rarity and where dictionaries such as Webster's, AHD, and Random House have used an alternative system for decades. These systems all differ from each other, so making our system identical to any one of them does not actually improve anything. We already discussed the name and differences in enPR and had a vote on the name. Please see those old discussions for more. --EncycloPetey 20:01, 20 January 2009 (UTC)Reply

I promised Ruakh that I would desist from the previous discussion he mentioned, as I agree it had become exceedingly childish, but I hope he won't mind me summarizing my opinion here. I'll try to leave childish argumentation behind.

EP has made repeated claims that EnPR is not the same as AHD. However, he has refused to back up that claim, despite a week of discussion as to what these alleged differences might be. (That's the last mention I'll make of EP.) As far as the symbols themselves are concerned, they are identical but for two minor differences:

  1. EnPR uses aʹ and a' for 1ary and 2ary stress, whereas AHD uses aʹ and aʹ. Note however that when EnPR was copied over to Wikipedia, it was with standard AHD stress marks, not the Wiktionary ones, and the difference was so minor it does not appear to have even been noticed.
  2. EnPR uses a period rather than a hyphen for syllable breaks. However, this appears to be a misrepresentation by the key, as the Wiktionary entries I've seen all use EnPR with the AHD hyphen.

EnPR has been represented as a compromise between AHD, Random House, and MW. However, if you look at where those systems differ, in every instance EnPR follows AHD. (See the comparative table at Wikipedia:Pronunciation respelling for English.)

Ruakh suggested above that "city" has a final <i> in EnPR rather than final <ē>. However, EnPR is not used for city. It is used for January, and there the final vowel is <ē>, as it is in the AHD. There may be slight differences elsewhere, such as whether the vowel of sing should be that of sin or of seen, but such differences (assuming they exist) would be in the application of the system, not in the transcription system itself, which would remain that of AHD. Besides, since these minor rules are not spelled out in the key, there is nothing to maintain their stability, and when in doubt people would be likely to follow the example of the AHD itself. And any such differences would have to be transparent despite not being covered by the key, and so effectively trivial. Trivial differences do not make a convincing case for the transcriptions being different. Furthermore, the AHD often gives more than one transcription to cover dialectal variation, which would subsume the kinds of details we're grasping for here.

So it would appear that EnPR and AHD are effectively identical, so close IMO they're equivalent to misspelling a word or two in the lyrics of a song and claiming that therefore they aren't a copyright violation. So yes, I do think this is blatant plagiarism.

Now, AFAIK you cannot copyright a transcription system, so we have every legal right to use the AHD transcription. And plagiarism isn't a crime, just highly unprofessional. The concern expressed earlier was that if we call it "AHD", that might be trademark infringement. This is the current situation at English Wikipedia, and I've directed the attention of the permissions dept. of Houghton-Mifflin to both Wikipedia and Wiktionary and asked them to comment on what we're doing. Of course, it's very possible they could care less.

Regardless of legality and the opinion of AHD, I believe the only honest thing to do is to either give the AHD full credit for their transcription system, or to develop one that is truly our own. I personally don't care which, but do have a few suggestions if we decide to go the latter route. We wouldn't have to change much. The macrons and breves on the vowels are learned by every schoolchild, so are universal among such systems, and they make up the vast bulk of the special symbols. All we would need to do is fiddle with the 'other' symbols:

  • Among the consonants, these are basically <KH> and <th>, as all the other digraphs are close to universal. The latter is ripe for change, as its formatting is lost when cut and pasted, confounding it with <th> (a complaint I've seen elsewhere). Most Americanist dictionaries distinguish the two TH's with formatting as the AHD does, or use a symbol like <th̸> that will get messed up by font rendering on many readers' browsers, and so are not ideal for computer transcriptions. The exceptions are COD, Cham, and AB, which all use <dh>.
  • Among the vowels, IMO the most intuitive change would be to make the rhotic vowels phonemic. That is, to use the same vowel symbol regardless of whether it has a following R. AFAIK, no Americanist dictionary does this consistently. The changes from AHD would be <ār> for <âr> and <ēr> for <îr>, as they're transcribed in the COD. (The other EnPR rhotic vowels are all phonemic.)


  • It might be worthwhile making secondary stress more distinctive, since very often it marks phonemic lack of stress (so-called "tertiary stress").
  • Finally, though this isn't currently listed in the key, it might be worth considering northern European <ö> to go along with <ü> for the rounded front vowels.

Changes like these would make EnPR as distinct from the published dictionaries as they are from each other, but yet IMO would still be readily accessible to someone like me who's been educated in US schools. Kwamikagami 08:53, 21 January 2009 (UTC)Reply

a comment on the dissimilarity: it really is practically the same as AHD (pace "Our system differs from that of the AHD at least as much as their system differs from those of other major dictionaries"). If you look at the other systems like Random House & Webster, this will be obvious (and I have as I made the comparison chart in wikipedia).
(and a by-the-way question: I'm not clear on what the significance of me being one of the new wiktionary summer editors is.) Ishwar 15:04, 21 January 2009 (UTC)Reply
it's easy to make the enPR chart different from AHD. Just do it. & then you can claim that it's significantly different but based on it and other dictionaries. Ishwar 15:07, 21 January 2009 (UTC)Reply
The AHD's system has some trivial variations, which is seen in the paper version, on the nine-year-old Bartleby.com website, using inline images to represent some characters, and the more recent Dictionary.com version, which uses Unicode characters. Our “EnPR” system is practically identical to AHD's, even duplicating the graphical nuance of by Bartleby's obsolete image technique. (specifically, we all use various methods to represent the paper dictionary's bold and roman stress marks, sĭ-lābĭ-fī′, and the small caps in loch, KH, and bon, N, sometimes appear raised). Substituting middle dots for hyphens for syllabification doesn't make this a novel system—the two are commonly used as equivalents in typography.
Making incremental changes to this for the sake of originality doesn't seem productive. Who's going to change the 3,300 pronunciations already out there in our dictionary?
If we were to abandon it and use another, then I suggest we pick an existing one. An independent standard has the advantage that it is already finished, and there is no reason for us to mess with it. As amateur volunteers, we are justified in compiling information from published sources, but not so much in noodling around with our own original transcription systems, so let's use one developed by lexicographers. If we can find one which is in the public domain, then this type of controversy won't arise.
Candidates include a chart from the 1913 Webster's Dictionary, a pre-1923 Oxford English Dictionary fascicle, and the 1911 Concise Oxford Dictionary. I suppose Webster's would be ideal, since it is the Americans who demand this. See w:Pronunciation respelling for English for a summary, but it would be best to find an original PD copy to start from. Michael Z. 2009-01-21 18:05 z
Michael, "Oxford English Dictionary" and "Concise Oxford Dictionary" are still trademarked, so they'd present the same problem as "American Heritage Dictionary". The transcription systems are not protected, AFAIK, just the names.
There's something to be said for choosing a system that retains both the macrons and the breves. That way all symbols are unambiguous even to someone accustomed to a different dictionary. Among the dictionaries in the comparison chart, only the AHD and old COD editions do this, though I don't know if the COD system dates back to 1911. (The old OED system would be unfamiliar to almost everyone.) Actually, the old COD is really nice apart from using italicized vowels instead of a schwa. That could be a typographic concession on our part. But it wouldn't resolve our worries about trademark infringement. Kwamikagami 19:17, 21 January 2009 (UTC)Reply

Please compare the AHD table of values with the ones we list to see the differences. There are several AHD symbols that we do not have at all. The Wikipedia article on AHD has them, but we do not: e.g. œ and ü. I am surprised that no one has been able to spot these differences. We also include symbols they do not, e.g. i, although Hippietrail has only recently added this into the table. The need for a symbol was discussed some months ago (I haven't been able to locate the discussion yet), in which British and American speakers came to the realization that there was a consistent difference between UK and US pronunciation of "i/y" in certain situations. At that time, we agreed about how to represent this difference in IPA, but I need to find the conversation in order to determine whether enPR was part of that discussion.

If you look at the discussion associated with our system, there have been suggestions made about changing other symbols as well, as the symbols in the table for these sounds were added without discussion last summer, and possibly without comparing our current usage at the time. Some of the changes proposed by Kwamikagami are in line with the proposals already made. If we can agree on an option for these cases, then we can proceed. The discussions that noted the problems seem to have stalled months ago, so it is good to see them active again. --EncycloPetey 19:38, 21 January 2009 (UTC)Reply

Thank you for answering this point! As for œ and ü (yes, I noticed them, and that you included them in the Wikipedia version of EnPR), IMO the omission of symbols for non-English sounds is a trivial difference, and we're still dealing with the AHD. And the i may or may not be EnPR ... So the essential difference between EnPR and the AHD would seem to be that the EnPR is not stable.
Michael raised the serious concern that modifying a system while it is in use makes it impractical to maintain the coherence of the dictionary. IMO we should either stick to the AHD system we have (I mean, who care about GA / RP differences like i? Americans are the only ones who are going to use this transcription anyway, and we have the IPA for the Brits) and give it proper credit, or we should settle on a distinctive Wiktionary variant and stick with it. If the latter, as Michael asked, who's going to revise all the existing AHD transcriptions, which are already somewhat incoherent from several minor changes which have never been followed through on? Kwamikagami 20:06, 21 January 2009 (UTC)Reply
I agree that omitting the 4 foreign sounds from AHD doesn't make our system novel. It hasn't differed significantly from the AHD's transcription since August 2004.[3] Although it appears to have changed in the details rather regularly, which doesn't do anything for its usefulness as a “standard”. If any of the symbols have changed at all, then our transcriptions in entries are unreliable, serving only as a placebo to mollify the IPA-haters. Michael Z. 2009-01-21 20:18 z
Re: " [] our transcriptions in entries are unreliable, serving incredibly useful, if only as a placebo to mollify the IPA-haters": FTFY. :-)   —RuakhTALK 23:33, 21 January 2009 (UTC)Reply
If that makes me smile is it an indication that I am lacking integrity? Cheers. Michael Z. 2009-01-22 15:54 z
  • I was the one who introduced both the IPA and AHD pronunciation systems to Wiktionary. I seeded them and helped them grow in the early days years ago but no longer have much to do with them since the community is larger now and there are many things for me to do.
  • At the time I had the mistaken belief that there was an "American dictionary pronunciation system" that all American dictionaries used. I suppose I expected there was a system used by linguists before IPA and that the same system was also used in dictionaries.
  • When I made the first version of the American system I did in fact have only the AHD system at hand and never made any secret about that fact. Only later did I come to realize that other American dictionaries had different systems with mainly the macrons in common.
  • In the early days I did not label either the IPA or American style pronunciations. Soon came font wrappers for each so they were displayed correctly and likewise when SAMPA was added. At some point a template for IPA began to be used which had a label. Soon people also wanted to label the American system so a name for it was needed. It may have been at this time that I searched for the name for the sytem only to find that it was AHD's proprietary system and not a general system used by American linguists and dictionaries.
  • The American style pronunciations then were labelled AHD and this continued for some time. In several discussions I reiterated the story of how I came up with them, that I wanted an American style system because I knew that many people were not comfortable with IPA. It was never my intention that our American system mimic one proprietary system. I disliked the italics and couldn't find definitive Unicode characters for primary and secondary stress.
  • At some point after this I stopped caring much about the pronunciations on Wiktionary at all. There were lots of arguments about how we should use IPA as well for a while and I ended up losing interest and left it up to the community. It was some time after this that the vote appeared on changing the name to enPR.
  • Initially there were more differences between our IPA and non-IPA systems as I was using it. But as others started to do pronunciations they didn't keep those differences. One thing I was trying to do was unify British and American pronunciations wherever possible as the IPA is a phonetic system and I had hopes that our other system could be more phonemic. The lost differences were various optional sounds in parentheses: I used (r) for postvocalic "r" pronounced only by rhotic dialetcs and also for "connecting r" as used word-finally in non-rhotic accents. I also used (ə) after any "l", "m", "n" which could be considered syllabic to allow both possibilites in a single transcription, and (j) or (y) to allow both the British and American pronunciations of "news", "emu", etc.
  • The pronunciation chart was not made or maintained by me. I seem to recall we had severl different ones on several different pages at various points. I had a table of my own where I was comparing how IPA was used in different ways in various dictionaries.
  • The /i/ segment used for final "-y" and possibly other unstressed "i" sounds has been used here by me for years rather than months in both IPA and American systems. It has been discussed more than once. Oxford dictionaries began using this symbol and I have seen it in at least some other dictionaries.
  • I recommend coming up with a new system of our own which keeps the schwa, the macrons and breves including the two-letter versions, and discusses possibilites for all other phonemes. I hate the italics in "th" and would like to suggest the Icelandic / Old English letters ð and þ instead.
  • If there are any questions I've doubtless left out some things here. — hippietrail 01:25, 23 January 2009 (UTC)Reply
Your IPA notes are very valuable info. In addition to the English chart for readers, we need English transcription notes for editors with this kind of information. I guess this could be added to Wiktionary:Pronunciation#Phonetics and phonology. Michael Z. 2009-01-23 18:45 z
It's been a week, and I haven't even had acknowledgement from AHD. I guess they don't care one way or another. I still think it's best for us to either be upfront and call our system AHD, or to modify it significantly (and of course update all the articles which use it!).
It may, of course, not be a good idea to make our own system. But here's the kind of thing that would strike me as fair for a non-credited transcription:
This has the additional advantages of displaying properly with very limited font support, and of being copy&paste-friendly. —Kwamikagami 11:22, 25 January 2009 (UTC)Reply
I'm not sure about a couple of the vowels, but this looks like an overall improvement to me.
I think the only way to reliably update existing pronunciations would be to give this a new name, and enter it with a new template, and let it happen on its own time. Michael Z. 2009-01-26 05:59 z
You're right, of course. That's the way to go. (Which vowels?)
But back to my original point: should we rename the EnPR system, and template, AHD, and stick to it in detail? Kwamikagami 22:58, 26 January 2009 (UTC)Reply

Dictionary perpetuation of nonce word

I looked about for a policy element covering this and I can't figure it. What exactly is the policy for a word that is in virtually all dictionaries, but only because, well, it's in all dictionaries? The word in question is the French (deprecated template usage) abdicataire, which a cursory search tends to confirm the note helpfully given in the fr:wikt entry "A nonce word of w:Chateaubriand". The word is otherwise perfectly usable: it sounds perfectly legit (e.g. not particularly contrived) and follows a fairly productive pattern. It's just frozen in dictionaries (and I have to wonder why it hasn't been cut before!). Should we go with our CFI or bow to dictionary "consensus" with a note like at fr:? Circeus 03:28, 22 January 2009 (UTC)Reply

*sigh* A better directed search revealed a few uses (in Belgian legal language, also about Bhutan and Sardinia). The point still stands regarding other similar words. Circeus 03:31, 22 January 2009 (UTC)Reply
Here are some that we have decided to keep in the past- dord, zzxjoanw. Nadando 14:47, 22 January 2009 (UTC)Reply
In principle, these brazenly fail to meet CFI. In practice, they are hard to get rid of once created; the community is reluctant to delete them for various ill-considered reasons. Some sort of consistent placeholder template (similar to {{only in}} but allowing for some additional data) would perhaps be ideal. -- Visviva 00:36, 23 January 2009 (UTC)Reply
I corrected your link so that people can understand what you are talking about. I have never seen this word but it is true most educated french speaker would be able to infer the meaning from the construction. If you have some examples of use by other authors, please add them to the french wiktionary. Koxinga 14:56, 22 January 2009 (UTC)Reply
It depends on the nonce. In this case, it might meet CFI even without the Belgian legal uses and so on: I think Mémoires d'Outre-Tombe could be considered a "well-known work". —RuakhTALK 15:18, 22 January 2009 (UTC)Reply

Ottoman Turkish

There was a discussion in the ibrik article about whether we use Turkish template along with Ottoman Turkish, since Ottoman Turkish is actually within the Turkish language. The result seems appropriate, but since the topic is general and not specific to the word ibrik, we decided to bring it up also here for discussion. Following is the copy of the recent discussion in the related article's talk page:

I noticed that you changed the language from Turkish to Ottoman Turkish in the etymological sections. Of course, most of the Turkish loans to Western languages took place during the Ottoman period. But I should state that Ottoman Turkish is a period of the Turkish language, not a separate one. That is to say, Turkish is not a language having started in 1923, what started in 1923 was the modern Turkish period only. And if you have noticed, major online English dictionaries prefer the term Turkish rather than Ottoman Turkish, even the word was loaned into English during the Ottoman period.
Here is a Britannica article mentioning about the 4 periods of the Turkish language, namely Old Anatolian and Ottoman Turkish, Middle Ottoman Turkish, Newer Ottoman Turkish and Modern Turkish periods: the article
Therefore, I propose that we write firstly Turkish, and beside it Ottoman Turkish if the word was loaned in the Ottoman period, in the etymological sections of the words. Thus, a reader may see a complete list of the Turkish origined words in the related category; and if s/he wants, s/he may also see the list of the words that were loaned in the Ottoman period in the Ottoman Turkish category. --Chapultepec 15:11, 24 January 2009 (UTC)Reply
I agree, Ottoman Turkish is a period of the Turkish language. However, Old French is a period of the French language, Middle English is a period of the English language, and yet they are all listed here in the etymologies. Yes, major sources perfer Turkish, because that is how it has been done historically. The language was not known as "Ottoman Turkish" to foreigners until the development of Modern Turkish. Most sources also write words in the Roman script, (and sometimes in their reformed, modern pronunciation) however Ottoman was written in Arabic script. What you can do, is list Modern Turkish words that have descended from Ottoman Turkish under the Ottoman Turkish headwords as Descendants. --Dijan 18:21, 24 January 2009 (UTC)Reply
Thank you for the comments. But, like French and English, Turkish language has earlier periods as well. Such as Seljuk era Turkish (or Turkic), Middle Turkic, Old Turkic etc. I do not even count them. What I try to explain is that the Ottoman era is a very recent era, and when a word is mentioned that it is of Turkish origin, this does not comprise only the modern Turkish period. Let me give an example with a link to ibrik article in Turkish Language Association's online dictionary. As we can see, its etymology is given as Arabic, and Ottoman Turkish is not mentioned as predecessor at all: TDK - ibrik
As for the major online dictionaries, as we all know they are modern dictionaries and are all up-to-date. But they still use the term Turkish for the etymologies in the existence of modern Turkish for almost 90 years.
And as for the script change, alphabet changes do not necessarily imply a change in the languages. Let's take some east European and ex-USSR languages for example, several of them went through alphabet changes in 1990s. But this did not make them different languages.
Although it does not correctly reflect what I try to explain here, another solution is also possible; we can enter the Ottoman Turkish template and term firstly, and the Turkish template and term beside it, in parentheses. (of course if the word belongs to the Ottoman era) --Chapultepec 19:14, 24 January 2009 (UTC)Reply
I understand your point. I am not against Modern Turkish listings at all, however it would be greatly appreciated if you bring it up with the rest of the community in the Beer parlour (Community portal) for discussion. --Dijan 07:39, 25 January 2009 (UTC)Reply
I am pleased to see that you are not against it, thank you. But, I feel the necessity of stating it once again :), the term Turkish language comprises not only modern Turkish period, but also the Ottoman Turkish era. Until 1928 it used the Arabic alphabet, and thenceforward the Latin alphabet is being used. --Chapultepec 18:58, 25 January 2009 (UTC)Reply

--Chapultepec 19:25, 25 January 2009 (UTC)Reply

I'm not sure I understand why we would need to say both "Turkish" and "Ottoman Turkish". How is this different from what we do for Latin, which is just one language, but for which we distinguish several periods in etymologies? --EncycloPetey 14:41, 26 January 2009 (UTC)Reply
As I stated above, like Latin, Turkish language has earlier periods as well, such as Anatolian Seljuk Turkish, Seljuk era Turkic, Middle Turkic, Old Turkic etc. I do not even count them at all. But the Ottoman era is a very recent era, and the major difference is the script change, nearly 81 years ago. As we can all remember, several countries in east Europe and ex-USSR went through alphabet changes in 1990s as well. So, I can say that it is a display in both scripts. --Chapultepec 15:53, 26 January 2009 (UTC)Reply
The issue ultimately boils down to how we divide languages here on Wiktionary. On Wiktionary, Latin is a single language (all of it), where English, Middle English, and Old English are separate languages, like Greek and Ancient Greek are separate languages. The logic behind this is an adherence to ISO 639 codes, which have different codes for all three Englishes ({{en}}, {{enm}}, {{ang}}), but only one for Latin ({{la}}). Turkish and Ottoman Turkish have different codes ({{tr}}, {{ota}}). Thus, we absolutely cannot simply say "from Turkish" when something is "from Ottoman Turkish," as, on Wiktionary, Turkish specifically means modern Turkish. Now, it is true that our etymologies are often not consistent in this area, as words from Old French are often cited as "from French" and words from Ancient Greek are often cited as from "Greek." However, this is something which is slowly being fixed, not promoted. We do occasionally break from 639 standards. For example, we treat all of Hebrew as "Hebrew", even though there is a separate code for Ancient Hebrew, and we lump all the Nahuatls together. However, these are both the result of formal proposals. If you would like to treat Ottoman Turkish and modern Turkish all as "Turkish," then you would need to propose a comprehensive proposal, which includes not only etymology format, but also entry and translation format, among other things. -Atelaes λάλει ἐμοί 19:51, 26 January 2009 (UTC)Reply
Yes, I have checked ISO 639 codes now, there is a different code for Ottoman Turkish, I have no objections for that. But this is simply because of the alphabet change, not due to language change. Turkish language also has earlier periods like Middle Turkic, Old Turkic etc corresponding to that of Greek or English, namely Middle English and Old English. And, for instance, we know that major online English dictionaries generally prefer to use the term Turkish in etymological sections even if the word was loaned during the Ottoman era. Following is a citation from Ethnologue website:
"ISO distinguishes this code from [ota] on the basis of time, [tur] applying to the Turkish language since 1928, and [ota] applying to the Turkish language prior to 1928. The year 1928 corresponds to the year in which writing reform occurred, changing from Arabic to Latin script. Thus, these two codes are distinguishing between the Arabic- and Latin-based writing systems rather than between languages. This goes against the normal practice for ISO 639-x, as described in clause 4.1.3. Thus, we deem that this language is also covered by the ISO code [ota]."
Here is the link for the above citation. What I try to do is find a simple solution rather than merge the two templates. Therefore I suggested to append the Turkish template where applicable. --Chapultepec 22:27, 26 January 2009 (UTC)Reply
Hmm.....that was an interesting cite. Thank you. However, the fact remains that we sort of need to do this all or nothing. If it is better that we treat everything as a single Turkish language, then we also need to deprecate {{ota}}, reformat everything in Category:Ottoman Turkish language, and set up a policy about how this all words. If we do it half-assed (i.e. just do it in etymologies, but nowhere else) we just end up with a mess, a very confusing mess for our readers. — This unsigned comment was added by Atelaes (talkcontribs) at 22:37, 26 January 2009 (UTC).Reply
If you say we can end up with a mess, then we should merge the templates. But this generalization should be under the name "Turkish", since there are newer Turkish loanwords that do not belong to the Ottoman era. For instance, when the user reads the word sultan comes from Turkish instead of Ottoman Turkish, it's not a problem. But if he reads that the word doner kebab comes from Ottoman Turkish, he will be messed, since that word is from the modern period. Maybe we can embed the term in Arabic script beside the one in Latin alphabet for etymologies, such as from Turkish sultan سلطان < from Arabic ..., but this can be technically problematic since we will be expected to do the same for declensions etc. Deprecation of [ota] template is also possible, maybe other users can reveal their ideas as well. --Chapultepec 23:10, 26 January 2009 (UTC)Reply
From what you say, and what Ethnologue says, I agree that it makes sense to merge Template:ota and {{ota}} into Lua error in Module:debug at line 160: parameter "y" is mandatory
Lua error in Module:debug at line 160: parameter "m" is mandatory
The Tea room(+) is discussing this project page at the moment.
Please come along and share your opinions on this and the other topics being discussed there.

Lua error in Module:utilities/templates at line 10: Parameter 1 is required.Lua error in Module:languages/errorGetBy at line 16: Please specify a language code in the first parameter; the value "{{{1}}}" is not valid (see Wiktionary:List of languages). and {{tr}}. I somehow had the impression that Atatürk also pushed through linguistic reforms, but I guess you can't change a language overnight in the way that you can a writing system. —RuakhTALK 02:35, 27 January 2009 (UTC)Reply

Yes, he pushed through some linguistic reforms to rid the language from the yoke of Arabic and Persian. He changed some words with its Turkish equivalents, if available with existing Turkish words, if not with newly derived ones. But most of the old words continued to take place in the vocabulary. Of course it was not possible to change it overnight. So we gained some new words, but it is too normal thinking that tens of thousands of new words English acquired within the last century for example. As for merging {{ota}} into {{tr}}, yes that is what I think too if it is generally agreed. But the question here should be how it is gonna be achieved. --Chapultepec 03:49, 27 January 2009 (UTC)Reply
If there are different ISO codes, then the policy here postulates that we have different languages. Similarity in the vocabulary is obviously no argument, otherwise we would have merged Bosnian, Croatian and Serbian into SH a long time ago. There is no language on this Earth which has remained unchanged for 710 years(Osman in 1299 coming to power) - even Icelandic has some changes in comparison to Old Norse and there are several new words for recently emerged novelties(albeit not borrowed from English). Just as non and is are different languages, from what I read here (many new words, different ISO-codes, Seljuk Turkish, ergo the following is evidently Ottoman Turkish) I came to the conclusion that these two (ota and tr) are quite dissimilar as well. To put it mildly, I oppose the merger. Bogorm 07:39, 27 January 2009 (UTC)Reply
(refutation of the merger appeal) Here is how БСЭ (BSE) judges on this matter: (translated roughly) The literary [Turkish] language began to emerge in the mid of the 19th century, abolishing the Ottoman litterary language which was fraught with Arabic and Persian loanwords - whence two conclusions: 1) the vocabulary is far more different than claimed above, and 2) There was no (Modern) Turkish language before 1850, just Ottoman Turkish (cf. Middle French, MHG). Bogorm 07:53, 27 January 2009 (UTC)Reply
But ISO 639 starts Ottoman Turkish from 1500, not from 1299. Thinking that modern English also started roughly in 1550-1600, that starting date seems normal. As for the literary language, this is a different thing. Ottoman literary language was a variety of the Turkish language used by the higher class for literary and administrative purposes only (and full of Arabic and Persian words), but it was not the mainstream Turkish language. The related article of Larousse Encyclopedia in Turkish also approves this.[1] Therefore Ethnologue deems no difference between the two languages, and sees it only a script change in the year 1928.[2] And let us not forget, similarly Katharevousa was used for formal and official purposes in Greece until 1976 in the existence of modern Greek since at least the 15th century. --Chapultepec 14:39, 27 January 2009 (UTC)Reply
As I stated above a couple of times, Turkish language has earlier periods as well, such as Old Anatolian Turkish, Seljuk Turkish, Middle Turkic, Old Turkic etc which correspond to that of other languages mentioned above. I have a Turkish English dictionary dating from 1856 in PDF. I look through the dictionary, but find almost no difference except for the script change.[3] For the ones who may be interested, here is the link. And here is the link for the online contemporary Turkish dictionary by the Turkish Language Association, anyone interested can compare them. As for the merger, I thought it, because it was written above that we could end up with a mess, if there will not be any problems I am ok for the easy solution too, simply appending the template. And as for the claim that there was no Turkish language before 1850, this is also in contradiction with what we discuss here. We discuss about the [ota] and [tr] codes, and their split date is 1928. --Chapultepec 15:39, 27 January 2009 (UTC)Reply
Here are two citations from "An Analysis of ISO 639" by SIL International:
"[tur] “Turkish” and [ota] “Turkish, Ottoman (1500–1928)” both map to the language Turkish [TRK]".[4]
"The name “Turkish, Ottoman (1500–1928)” suggests to us that the code element [ota] was created to distinguish Turkish literature written in Arabic script from more recent Turkish literature written in Latin script. In other words, the primary distinction is based on script rather than linguistic differences. The alternative is that this represents a linguistic distinction based on time, but there is no other precedent for such distinction in the period since 1500, and a claim of a purely linguistic distinction with such a recent boundary is suspect. Also, the year 1928 corresponds with the year in which orthographic reform for Turkish took place".[4] --Chapultepec 12:22, 28 January 2009 (UTC)Reply
1. "Osmanlıca." Büyük Larousse Ansiklopedisi (Turkish edition). Vol. 15. Gelişim Yayınları, 1986.
2. ISO 639 Code: tr in Ethnologue.
3. And except for the new words naturally, but the same goes for the English part too, i.e. you cannot find words like "computer", "database", "laser" etc.
4. SIL International. "An Analysis of ISO 639". pages 17, 18 --Chapultepec 00:06, 28 January 2009 (UTC)Reply
In the light of information and sources given above, and if there is no objection, I would like to apply the initial suggestion, namely appending the Turkish template where applicable in the etymological sections. --Chapultepec 23:27, 31 January 2009 (UTC)Reply

template:typesetting and other context labels

If I enter {{typesetting}}, the template changes that to a different concept: “metal typesetting”. Entering {{context|typesetting}} does the same. (If you happen to view the template's page, it helpfully adds “For more general terms, use {{typography}}”.)

It's not only annoying to type what you're thinking and have a template second-guess you: you might not even notice that your text is changed (as I didn't notice here for about six months).

Shouldn't context templates have identical names and text, to avoid such mistakes? Michael Z. 2009-01-26 18:15 z

Yes and no. Template:legal can't be called law any longer (as that's a language code), but still needs to display law in entries for backward compatibility. I'm not saying that's the usual situation, but it does come up. As far as typesetting, I, for one, would have no problem changing its text if someone first goes through each of its calls and verifies that typesetting is a good text to use there (or adds metal|_|). (And I'm not volunteering.) But although ideally each context template should display its name, I think each template would need to be discussed separately before changing it, rather than just saying "every context template should display its name".—msh210 19:01, 26 January 2009 (UTC)Reply
Well, there's only 5 entries using {{typesetting}}, so even I'd volunteer to do that:) --Bequw¢τ 05:10, 27 January 2009 (UTC)Reply

I'm glad to sort this out, but the categories need some reworking too. Metal typesetting is a (mostly historical) subset of typography, but the two are diversely categorized. The former adds category:Printing, which is in category:Technology and category:Publishing, the latter adds category:Typography, in category:Language and category:Communication.

I think they should both just add Typography, which should also be made a child of Printing. Any objections?

I think we also need a new category—there is category:Web design, but nothing in category:Media which encompasses publishing on the Web, and related electronic or digital media. Or is category:Publishing meant to include this too? Michael Z. 2009-01-28 16:57 z

Okay, I've gone through and changed a couple of senses from metal typesetting to typography, because they are terms used in digital typesetting too. I don't know whether the Italian carattere is still in use or belongs only in metal type.
To solve the template naming problem, I'd like to move {{typesetting}} to {{metal typesetting}}, and add a convenience redirect from {{metal type}}. Another possibility is to make it more specific as {{movable type}}, the main technology used for five centuries, until the invention of hot metal type in the 1880s. {{typesetting}} should either be deleted, or redirect to the slightly more general {{typography}}Michael Z. 2009-01-28 17:21 z

DACCO

I've come across an online open source Catalan-English-Catalan dictionary called DACCO (Diccionari Anglès-Català de Codi Obert). It uses the Creative Commons Attribution-Share Alike 2.5 License. Before I start importing it to the Wiktionary (and creating a reference template {{R:DACCO}} for the entries), I thought I'd make certain that doing so would be acceptable. I think so, but I'll be the first to admit that trying to figure out how various open source licenses interact is beyond my ken. Carolina wren 23:02, 28 January 2009 (UTC)Reply

I can't seem to find the discussion at present, but my understanding is that the GFDL and CC-BY-SA are not compatible, i.e. content released under one cannot be distributed under the other outside of fair use, because the GFDL is more restrictive than CC-BY-SA, while CC-BY-SA in turn prohibits redistribution under terms more restrictive than its own. Or something like that; you may want to check the exact wording of the respective licenses. This situation is noxious and absurd, since for almost all human purposes the two licenses are identical, but there it is.
Should you want to pursue this question further, it might be worth posting at commons:Village pump, as that's where you will find the largest concentration of copyright-savvy Wikimedians. -- Visviva 06:08, 8 February 2009 (UTC)Reply
If we ignore technicalities, we could just ask DACCO if they would let us import their data? It should be a fairly easy task to run with a bot. Would people here be interested in this happening?. (If we add DACCO as a reference in the entry, and link to them in the edit summary, I can't see that the people there would mind - oh wouldn't it be nice if licenses were better written :). Conrad.Irwin 01:20, 9 February 2009 (UTC)Reply
Right now, since Wiktionary apparently has R: templates to dictionaries still under copyright without any sort of copyleft scheme attached to them, I'm of the opinion that so long as I don't slavishly copy their defs, using them as a credited reference with an appropriate R:DACCO template should suffice. If it doesn't then there are quite a few other entries with problems. DACCO has enough quirks and differences in formats that I wouldn't want to bot import them anyway. Carolina wren 02:26, 9 February 2009 (UTC)Reply

Foreign language sections in dictionary entries

I feel it inappropriate that full foreign language definitions are added to the bottom of Wiktionary entries. There are separate dictionaries for separate languages and, at most, there should be a link to the appropriate page, any more is wasteful repetition. Consider the page for "program": it has entries for Czech, Slovak, Norwegian, Hungarian... what is the point of having these here, complete with meanings copied from the English section above, when they already exist in their respective dictionaries? The Norwegian section even has its own verb conjugation table! M0thr4 09:44, 29 January 2009 (UTC)Reply

Every Wiktionary has its own entry for every word in every language. Compare our Swedish section on program and its Swedish counterpart. The grammatical tags on the Swedish Wiktionary are in Swedish, whereas our are in English. As someone who does not know Swedish (and certainly not Swedish grammatical terminology), our entry is far more useful for me. This applies to a great many things which our entries often do not yet have, but will eventually, such as usage notes, regional/dialect information, etc. The foreign language sections you're seeing on program are short and simple, and will be expanded in the future with lots of information about the words written in English. -Atelaes λάλει ἐμοί 10:13, 29 January 2009 (UTC)Reply
As Atelaes says, each Wiktionary has explanations in a different language. Compare the English Wiktionary entry for (deprecated template usage) flōs with its counterpart on the Latin Wiktionary at la:flos. Which version of the entry do you find easier to read? --EncycloPetey 23:13, 31 January 2009 (UTC)Reply

ISO 639-5

ISO 639-5 was released May, 2008 and it assigns 3-letter codes to language families (eg "Turkic languages" is trk). There are currently around 114 codes and they use the same "pool" as ISO 639-2 & 3 codes. 639-5 is disjoint from 639-3 (the standard for individual languages) but is a superset of the "collective" codes from (which codes both "collective" and individual languages). I think Wiktionary should employ ISO 639-5 codes for specific purposes here, such as in standardizing etymologies. These codes would allow us to standardize many of the entries in Wiktionary:Languages without ISO codes (see how). As these codes aren't valid for L2 entries we should prefix them. That way they could be restricted from being subst'd or used with {{infl}}, {{term}}. Certain templates, such as {{etyl}} and possibly {{proto}}, could be coded to look for language family codes at the specific prefix. Atelaes suggested macro: as a prefix, though that could be confusing as certain codes in ISO 639-3 are termed 'macrolanguages'. The title of 639-5 is Alpha-3 code for language families and group so maybe group: or family: but maybe someone has a better idea. As there are some existing ISO 639-2 "collective" codes (and therefore 639-5 codes) currently in use they would be prefixed as well. Thoughts on this plan? --Bequw¢τ 06:44, 30 January 2009 (UTC)Reply

So far some of the -3 family codes (e.g. {{sla}}, {{bat}}, {{dra}} etc.) are used with {etyl}, so they should first be relocated to fam:xxx (or whatever the prefix be) before this gains official blessing. I like this idea of usage of secondary namespace for families, as it contains more direct metadata providing the separation between individual and groupings of languages, and would prob. simplify maintenance. I'm not sure how this is supposed to work with {proto} as that templates takes explicitly name of the family as the first positional parameter (unless it gets rewritten to support both e.g. {{proto|Indo-European|...}} and {{proto|ine|...}} types of invocations, much like some of the templates now are accepting both ISO code and full language name). --Ivan Štambuk 23:32, 31 January 2009 (UTC)Reply
{{etyl}} can now be passed language codes that exist in the etyl: prefix ([fam] is an ISO code so also wouldn't have made a good prefix). Right now we just have ISO 639-5 codes there (see cat). Note for those wanting to create these language code templates: as these templates have limited use (they aren't {{subst:}}-ablee) they have a different, more useful, format than the normal language codes. --Bequw¢τ 19:50, 14 February 2009 (UTC)Reply

Hungarian form of template - new approach

I am trying to simplify the way noun forms are entered. The current approach requires the editor to know what ending belongs to what case, the abbreviated case name, and the order of parameters. It's easy to make mistakes. In the new approach, the template will figure out the case name, the editor would have to enter only the ending. I am also thinking about leaving out singular and plural. Examples for kert (garden):

  • kertben (in the garden) - inessive singular
    Current method: {{hu-inflection of|kert|ine|s}}, output: inessive singular of kert
    Proposed method: {{hu-infl|kert|ben}}, output: inessive of kert
    A supplemental grammar tag template would contain the case information: ben = inessive.
  • kertekben (in the gardens) - inessive plural
    Current method: {{hu-inflection of|kert|ine|p}}, output: inessive plural of kert
    Proposed method: {{hu-infl|kertek|ben}}, output: inessive of kertek
  • kertemben (in my garden) - possessive inessive singular
    Current method: {{hu-inflection of|kertem|ine|s}}, output: inessive singular of kertem
    Proposed method: {{hu-infl|kertem|ben}}, output: inessive of kertem

Do you think this is simpler? Are there any risks to this approach? Any feedback is appreciated. --Panda10 22:57, 31 January 2009 (UTC)Reply

This would work as long as each possible ending is unique to just one inflectional form across all parts of speech that will use the template, and provided that every possible ending is included in the template switch. I don't know whether this approach would increase server demand or not. --EncycloPetey 23:07, 31 January 2009 (UTC)Reply

February 2009

Wiktionary:Votes/2009-02/Amending ELE Order of Headings

I've just opened a new vote on this, it's a trival change to ELE, just reorganizing a few headings, and clarifying the suggested position of trivia sections (which we already describe). Just alerting folks via the BP. JesseW 20:40, 6 February 2009 (UTC)Reply

language agnostic?

Shouldn't the context categories be language agnostic, and not favour English, instead having a category:en:contextname or category:English contextname subcategory instead of putting everything into the head category? 76.66.196.229 14:49, 7 February 2009 (UTC)Reply

No, because this is the English Wiktionary, where we describe all the words in all the world's languages, in English. It is correct and appropriate to have our context categories be in the language we are using to define words in. 70.213.100.180 (really, w:User:JesseW/not logged in) 01:31, 8 February 2009 (UTC)Reply
I always thought that English Wiktionary meant all explanatory text, etc was in English. I've noticed that some context categories have "English ..." subcategories. Since this dictionary has multiple langauges, it'd be easy to mix in other languages into the head category, so that non-English words end up in the English category unless it would be specifically categorized. 76.66.196.229 05:09, 8 February 2009 (UTC)Reply
I raised the same point a couple of months ago (IIRC), with some support. However, the general desire then was to retain the status quo, and have the ‛bot account AutoFormat correct any miscategorisations. TBH, I didn’t find the counterarguments particularly convincing, and I don’t know how much reprogramming has been done to AF since then to support the status quo. I don’t think that it’s seen as much of an issue ATM.  (u):Raifʻhār (t):Doremítzwr﴿ 12:19, 8 February 2009 (UTC)Reply
I agree with the OP that the fact that this is the English Wiktionary shouldn't entail that English-language entries get special consideration. It's always annoyed me, for example, that where English words are homographs of words in other languages, the English word appears first on the page, instead of in alphabetical order by language. But that's the status quo, and inertia and fear of change will no doubt keep it that way. Angr 13:04, 9 February 2009 (UTC)Reply
The status quo need not necessarily remain as it is. The last time this sort of “language agnosis” was proposed, it was shot down; however, if there is greater desire for it this time, things could change, and we could begin a vote to achieve those changes. That said, FWIW, I’m in favour of retaining the English-first structure of our entries, because those are the sections that contain proper definitions (rather than just straight translations), extensive translations, &c., and are usually simply more detailed.  (u):Raifʻhār (t):Doremítzwr﴿ 16:08, 9 February 2009 (UTC)Reply
If English really were the thing, then shouldn't the parts-of-speech categories also make English prime, instead of category:English verbs, it should just be category:Verbs? I think that the context categories should be treated like the POS categories. 76.66.196.229 09:44, 11 February 2009 (UTC)Reply
Doing so would blur a very important distinction. The topical categories sort meanings according to the definitions. The POS categories sort labels according to their usage. This is an critical distinction, and is the same reason we define (deprecated template usage) dog as "an animal, member of the genus Canis..." and not as "a noun..." or "a word used to represent...". The lexical meaning and the grammatical function are not the same thing. --EncycloPetey 08:27, 14 February 2009 (UTC)Reply

AWB request

Hi, I am trying to fix the family of German declinations templates, of the form Template:de-decl-noun*. This will involve batch-moving the entries of some preëxisting categories, which is work ideally suited for Wiktionary:AutoWikiBrowser. I request that an admin enable my account for use with AWB. Thank you, OldakQuill 13:42, 8 February 2009 (UTC).Reply

Unknown etymologies

Although it's probably been answered already somewhere on this wikt, I'm going to ask it anyways. Should there be (or, is there) a special template/category for words/phrases whose origins are unknown? If there isn't, I suggest there be one made. This would allow etymologists who work here to have at their disposal a list of all the words featured here which have been designated as having an unknown origin. Also, those wiktionarians who happen to already know the origin of a given word can add it if they see it in the list. The system here is simple yet effective. Here is basically what it would look like:

Etymology
Origin unknown. (This would be represented by the template {{etyu}} or {{etyl-u}} or whatever is best.)

This template would also add the pagename to the category of words with unknown etymologies:

Category:Entries with unknown etymologies

or something to that effect.

Again, if such a scheme already exists, please let me know. I'm too lazy—er, busy—to search all over en.wiktionary.org to find out. So, whaddaya think?—Strabismus 04:40, 9 February 2009 (UTC)Reply

{{rfe}} - to request an etymology, {{unk.}} for really "unknown" etymologies. Both do the sorting automagically. --Ivan Štambuk 04:43, 9 February 2009 (UTC)Reply
Ok. I'll check it out. Thanks.—Strabismus 21:30, 10 February 2009 (UTC)Reply

Completely Obvious Things

Title redacted: Original title was: == Completely Obvious Things that should have been Implemented from the Get-go and how Simple it will be to Add their Functionality if Someone Pulls their Thumb out ==. -- Visviva 12:14, 9 February 2009 (UTC)Reply

Or, how to piss everyone off with a feature-request, by authors unknown.

  • Search: It should be possible to search for sub-verbal particles of words and according to language, type or any other templated taxon. Example search query: "{{Finnish adverbs}} -vasti" would return (amongst others) http://en.wiktionary.org/wiki/toivottavasti
There already exists a special page to find words begining with a certain stem, which one assumes incorporates a custom sql query. This could be facilely extended to allow for generic word components.
  • Interlingua: when checking the suitability of a potential word translation, it is vital to be able to easily check the reciprocal definitions in other language wiktionaries. Currently, one has to look up a definition, then click the interlingua link to the desired language, then click to the various offered translations, then click back to the initial language in the interlingua box.
This is 2009 people... Creating a graph of article links is potty-training these days as would be incorporating the reciprocal definitions via a "showbox" (a la word declensions) into the interlingua link.

It would be really nice if someone with the administrative privileges to implement these features would give it a go. If how to do either of these things remains unclear, please feel free to ask for further elaboration or clarification.

217.112.245.49 11:50, 9 February 2009 (UTC)Reply

These are perfectly reasonable requests, but you're talking about software issues, which none of us here have anything to do with. Please try Bugzilla. -- Visviva 12:14, 9 February 2009 (UTC)Reply
These are wiktionary issues, not mediawiki issues. Implementation of the features suggested is the onus of wiktionary administrators not the developers of its software base. The mediawiki devs have other things on their minds than improving wiktionary; wiktionary admins, one would like to think, less so. 217.112.245.49 19:50, 9 February 2009 (UTC) (OP)Reply
Wiktionary is hosted by the same people who host Wikipedia, the server admins are the same people. They concentrate almost solely on Wikipedia (grumble) but they will implement changes for Wiktionary (if someone else progams them, they can be bothered, and they don't think it'll hurt performance too much). The software is not well designed for a dictionary, it is designed for an encyclopedia, this brings issues - but it doesn't seem to be too detrimental (far worse would be software that had been designed for a dictionary badly). Feel free to contribute patches or extensions to MediaWiki - it is then possible that they will enable them for us - but not without an unnecessary amount of red tape. </rant>
Search by category would be a wonderful feature, but it requires adding all category information to the search indices, (it's not "not doable", but it would require a lot of thought and clever hackery of the search libraries). The same "searching for substrings" is just not feasible if you have to scan 1.2 million records - you need to add it in some normalized form to the indices. (patches welcome :p) [note that "starting with" or "ending with" are easy, as a prefix search is cheap and you can add the reversed string to the indices).
I'm not sure what you mean by "reciprocal definitions" but that sounds much easier - you just want the definition of a word in that words language to appear on the page for that word in the English Wiktionary? We can probably hack a good enough solution to this in Javascript reasonably easily (yes, we could spend much longer writing a PHP solution but it's complicated by the fact that not all Wiktionaries are in the same database clusters), but it'd be nice to know exactly what you'd want.
Some of us have given thought to how to build software that can actually store dictionary information in a structured way, which would allow for much more amazingly cool features, but it's an exceedingly hard design problem as you no-sooner make an assumption about words than some other clever-bugger finds you a couple of thousand places you can't make that assumption. Conrad.Irwin 00:16, 10 February 2009 (UTC)Reply
You should also note that Interlingua is a language (albeit an artificial one). So, I had to read your request several times to figure out that you probably weren't asking about that langauge. --EncycloPetey 08:21, 14 February 2009 (UTC)Reply

New word

Could the word xzf become a word? We have decided that it means the "sound of a tennis ball machine ejecting its yellow round items of torment"--God'sGirl94 15:02, 9 February 2009 (UTC)Reply

No, see WT:CFI. --Ivan Štambuk 15:21, 9 February 2009 (UTC)Reply

Entries with "Shorthand" sections

I came across some entries (modified a few years ago, and, so far as I see, all beginning with "a") where someone added a ===Shorthand=== header and shorthand notation. Some examples: abhorrence, abjuration, abdicated, abated, abhorred, abase, abed, abbreviated, abasedly. Maybe I just haven't been paying enough attention, but do we do this? If this notation is to be added anywhere, shouldn't it be in the "Translations" section? Is any action called for here? -- WikiPedant 17:35, 9 February 2009 (UTC)Reply

It seems like a good thing to have in a dictionary unless the shorthand of a word is easy and algorithmic to figure out if you know the word. (That does not seem to be the case for Gregg, for example.) Although it's technically allowed for under ELE ("...other trivia and observations may be added, either under the heading "Trivia" or some other suitably explanatory heading...."), it would probably be wise to allow for it explicitly, or vote it out of Wiktionary. On another note: the entries you mention have shorthand represented by letters, whereas I would think that images of actual shorthand would be far better.—msh210 21:26, 9 February 2009 (UTC)Reply
I guess this is in the same category as alternative scripts and other representations, probably including American Sign Language, Braille, semaphore, Morse code, etc. Shouldn't this be a link to another entry? Michael Z. 2009-02-10 05:23 z
A side point: Not American Sign Language, which, unlike Braille and Morse code, is a separate language rather than a representation in signs of English. Main point: Since shorthand is not currently in Unicode (but see the thread [4] and specifically the message [5]), we have no way of representing the shorthand representation of a word except s.v. the English entry. Even if shorthand were in Unicode, whether a shorthand representation of a word would deserve an entry would need to be decided. Note that we don't include Braille representations of words, even though we do have individual Braille letters. That's probably as it should be.—msh210 18:21, 12 February 2009 (UTC)Reply

Leading bullets in R: templates

Can we please get rid of all the leading bullets in our various R: templates? Templates with initial bullets break if used in <ref>…</ref> tags. They are easy enough to præface with a bullet, if necessary, in the entries themselves, by the mere addition of *. AutoFormat could probably be programmed to add a bullet before any R: template included in a References section which is not called by <references/>. I would be most grateful to whoever sorts this out. Thanks in advance.  (u):Raifʻhār (t):Doremítzwr﴿ 17:55, 9 February 2009 (UTC)Reply

Support.RuakhTALK 01:42, 10 February 2009 (UTC)Reply
Support. This sounds like a good idea to me too. But it is essential to perform both parts of the task described Doremítzwr. The bullets can't (a) be removed from the templates without also (b) dispatching AutoFormat to insert bullets in the entries. -- WikiPedant 05:12, 10 February 2009 (UTC)Reply
Not sure AF would be fast enough for this job; removing the bullets would instantly break a large number of entries, since anything with two or more "R:" templates in sequence would be garbled.
I do have one of my trademark baling-wire-and-duct-tape solutions for this, which I implemented for example in {{R:Dictionary.com}}. If by some combination of human and bot activity we first change all instances of "[newline]{{R:Dictionary.com}}" to "[newline]*{{R:Dictionary.com|bullet=}}", we can then remove the "{{{bullet|*}}}" code from the template and AF could then gradually remove the "bullet=" junk DNA from the entries in the normal course of its work. This is very inelegant, but it would allow a smooth transition with no breakage. -- Visviva 06:10, 10 February 2009 (UTC)Reply
It is much easier than that! If there is both a bullet in the wikitext and in the template, it still renders just fine. (It looks like two lines, one containing just "*" which is elided.) AF has been routinely adding bullets before a number of templates, which can then at some point have the "internal" bullet removed. One doesn't need a "bullet=" parameter. If I add all R: names to the templates that AF adds * before (using a wildcard, anything starting with R:) it will add them fairly rapidly, and then we can just strip the * out of the templates. I've changed AF, just give it a while. (day or two) Robert Ullmann 17:21, 10 February 2009 (UTC)Reply
Like this, note the references are rendered properly in abigail. It is screening for them, so will find all in the current XML in one pass. Robert Ullmann 17:27, 10 February 2009 (UTC)Reply
Thanks very much for sorting this out, Robert. I am most pleased to see this problem fixed so quickly.  (u):Raifʻhār (t):Doremítzwr﴿ 17:18, 11 February 2009 (UTC)Reply
Yay! -- Visviva 02:23, 11 February 2009 (UTC)Reply
Do note that your "baling-wire" method would be fine if we needed to, in this case it was just a bit simpler. AF has now completed a pass with the 27 Jan XML dump, and again with the 10 Feb dump. There are still a few out there because it doesn't pick up in its screening any entry more than once in 35 days, but they will get caught; as will any new uses of R: templates without *'s. What all this means is that it is now "safe" to remove the internal * from R: templates as desired. Robert Ullmann 11:16, 12 February 2009 (UTC)Reply

Entry layout explained confusion

There is an inconsistency between the order of sections in the index and their order in the Additional Headings and Order of headings sections.

In the index, derived and related terms are shown after translation, in the Additional Headings section they are shown before. In Order of headings, Coordinate terms and Descendants are also shown before Translations.

This is confusing. - dougher 21:16, 9 February 2009 (UTC)Reply

As the "Order of headings" was established by vote, while the "Additional headings" are present only by inertia, there should be no problem with bringing the second into line with the first. In fact it should not even require an additional vote IMO. I will wait for counterarguments before proceeding. -- Visviva 05:19, 10 February 2009 (UTC)Reply

ELE clarifications

The WT:ELE currently has a section labeled "Order of Headings" within the "Headings after the definitions" section. It is missing some of the headings described below it, and some that are listed are described in a different order from how they are listed. It is also unclear how widely the "Order of Headings" applies: does it just apply to the L3 headings that appear after the definitions, or does it apply to any instances of the listed headings, wherever they appear, or does it apply in some other fashion? There is also simmering disagreement over the very presence of some of the headings currently described in the ELE.

I don't want to get into the discussion over the presence of any headings currently mentioned in the ELE -- that's a complex argument, and separate from clarifying what currently is there. I hope that re-ordering the heading descriptions to match that of the "Order of Headings" list is uncontroversial. I think that the ELE currently is quite unclear in the "Headings after the definitions" section -- many of the descriptions of headings say their contents should normally not be in those headings, or that the headings should be within the definitions, rather than below them, or in other combinations. JesseW 00:01, 10 February 2009 (UTC)Reply

I agree that this is/should be non-controversial. But I do think that the ELE should address the issue which led to the unfortunate collapse of the previous vote, namely ontologies (that "Related terms", "Derived terms", and "See also" may be placed at various levels, usually L3-L5, depending on whether they refer to a particular POS/Etymology/Pronunciation or to the language section as a whole, while "Anagrams" is always at L3). This present discussion would be a good place to hash out the wording of that paragraph, which IMO should go either just above the "order of headings" section or somewhere higher in the tree. We could then also update the example (or add another example) accordingly, thus covering all the points of the previous vote and a little more, while (hopefully) staying clear of controversy. Not sure what the best wording would be ... -- Visviva 05:30, 10 February 2009 (UTC)Reply

Well, here's some proposed wording, to go directly above the Order of Headings:

Many of the headings listed below may be specific to
a particular Part of Speech, Etymology, Pronunciation,
or language, and therefore may appear multiple times.  
This Order of Headings only applies to the order 
within a particular set, i.e. Related terms should 
always go before Derived terms, but Verb-specific
Derived terms can go before non-specific Related
terms.

This doesn't address which headings can be repeated, but it's a partial solution. Visviva, thanks for your response. JesseW 19:50, 10 February 2009 (UTC)Reply

But that contains an error. Derived terms should precede Related terms. We agreed on that in a vote setting the order of L4 headings. --EncycloPetey 08:15, 14 February 2009 (UTC)Reply

Formatting of intentional errors

I'm thinking of including in a usage note for an entry I'll be writing soon an example of what not do (and what might be done by someone who tries a literal translation from English into Catalan of numbers such as fourteen hundred ninety-two). Is there any template or CSS class that would be used to give a consistent format to examples of erroneous text such as catorze-cents noranta-dos? If not, would it be a good idea to have, and what should the format be? (Probably not what I gave in this example.) Carolina wren 23:14, 10 February 2009 (UTC)Reply

Often times an asterisk is included in running text before such words or phrases. For example, "The past tense of run is ran, not *runed". --Bequw¢τ 05:04, 11 February 2009 (UTC)Reply
I thought asterisk was a fairly standard marker to indicate a reconstructed word, or does that convention apply only to Proto Indo-European? Carolina wren 05:25, 11 February 2009 (UTC)Reply
It's used both to indicate ungrammatical constructions and to indicate reconstructed forms. It's multipurpose. :-)   See [[*]]. —RuakhTALK 13:45, 11 February 2009 (UTC)Reply
Is there another generally accepted way of marking "ungrammatical" constructions that would prevent confusion with unattested? I suppose that it may be true that the contexts of use may prevent there being many real opportunities for confusion. Are there contexts where one would need to distinguish? I think so. For example, *"most consanguinal" could be read as an assertion about attestation or ungrammaticality. In a discussion of what should appear on an inflection line, both uses are plausible. Also, an unattested form might be deemed a source of an alleged ungrammaticality. OTOH, the wonderful piece by Arnold Zwicky, Mistakes, seems to have no distinct marking at all for mistakes in its 47 pages. DCDuring TALK 16:24, 11 February 2009 (UTC)Reply
Some linguists make the difference between syntactical ungrammaticality, which is expressed with an asterisk, and pragmatic or semantic unfelicitousness (we need this word!), which is marked with a hash sign (#). Unfortunately, this convention is not widely honored. Nevertheless, I do would plead for something more intiutive, less technical here, though I have no idea what. Why don’t you simply create a template {{disprefered}} or {{ungrammatical}}, give it an initial formatting, and maybe later people come up with something better. H. (talk) 12:41, 12 February 2009 (UTC)Reply

Archiving/cleaning up of talk pages

I am regularly annoyed with the unusable stuff one finds on talk pages. There seems to be some sort of convention not to remove anything from talk pages, because they are supposed to reflect discussion, but sometimes the stuff is really old or no longer usable, ar has long been resolved. Can’t we introduce a system that after a certain amount of time, such information can be removed, maybe even have the page deleted. One concrete example I just saw is Talk:gullible. While originally it was a proper discussion, this is just no longer relevant, and there are some stupid remarks in there. What do people think? H. (talk) 12:34, 12 February 2009 (UTC)Reply

Not having an outlet for such stuff might be worse in its consequences. DCDuring TALK 12:39, 12 February 2009 (UTC)Reply
I would agree that things like Talk:gullible can be speedily deleted (though I won't delete it now, lest I derail this discussion). "No usable content given", I would call it... On the other hand, IMO any substantive conversations about entry content should be kept indefinitely, even after they have been resolved. -- Visviva 13:56, 12 February 2009 (UTC)Reply

Ethical question

Hello there. For twelve years I've been running a web dictionary of British slang, called 'The Septic's Companion'. It's something I've pottered around on in my spare time - the content was all written by me, but it's evolved over the years based on large amounts of feedback from visitors. I think it's reasonably accurate now, but obviously it's not the OED. I recently self-published the thing as a book, which is advertised on the site. This effectively makes the site a commercial one (not one that makes me any money, really, but I guess it's the thought behind it that counts). I'm about to get to the point any minute now...

I noticed that lots of the words in my dictionary were also here, so the other day I added a link to the word on my site as an external link to a few pages here. Once I'd done that I got a very polite message from Ivan Štambuk suggesting that if all I was intending doing was adding these links, then I wasn't really doing anything particularly useful, and may be considered as spamming promotional material. I do see his point, and I'm not going to do that any more. I have an idea of what I might do instead, but I'd be very interested in other points of view as to whether this is ethically appropriate or not. I'm thinking I will go through my own dictionary word by word, looking up each word in here. For the ones where I have more information (or corroboration of information that's already here) I would like to add my web site as a numbered "Reference" link on those items. Of the 700 words in my dictionary, around 400 have British slang meanings which aren't already covered here, so there's quite a lot of new content. What's the general feeling on that? Obviously it involves me blatantly sticking my web site links on Wiktionary, but it's only in places where I'm corroborating or adding content. The best way to do this might be to add my site reference using an automatic URL generator, much like you already have for Webster - Ivan has reservations about this because of the "some guy doing this in his spare time" nature of my site, the fact that Wiktionary usually references published utilisations of words rather than definitions, and the fact that I'm trying to sell a book. He said he'd be willing to float the idea here to see what the general feeling is - I jumped the gun on that but I'll let him chime in if he thinks I'm misrepresenting him.

Anyway, very interested in your thoughts. In support of my case, I noticed today that someone had actually already linked to my site from another word here (pillock), although using an older URL of mine (english2american.com). --Pugwash 22:16, 12 February 2009 (UTC)Reply

When you are adding your site as a references, I assume that is because you are adding the missing content from your site to here? If you have improved an entry using the information you have researched for your book, then by all means link to the website (alternatively just cite the book). On the other hand, adding only links just lowers our signal to noise ratio, please don't do it. Conrad.Irwin 00:27, 13 February 2009 (UTC)Reply
Ooh, I notice that you're giving away free copies of your book to websites that link to you. Can I put in an order for 1077 free copies? :p (151,000 would be fairer, but the inactive users aren't going to notice). Conrad.Irwin 00:42, 13 February 2009 (UTC)Reply
Yep, I'd only be adding a reference if I actually took some info from my site and put it up here. The, umm, higher signal-to-noise ratio version was what I was originally intending doing before Ivan slapped my wrists about it (I've already done a couple of those that I'll revert). I'd rather link to the web site than the book - the web site is a bit more up-to-date and slightly easier for the average user to verify. As for the free copies, I hadn't exactly established what I'd do for sites with community ownership. However, once I've established that by some means, "no" will be the ultimate answer. :) --Pugwash 01:37, 13 February 2009 (UTC)Reply

Web Services

Any chance someone would provide a Web Service interface to access definitions? I'm willing to do some work there if that helps

Have you seen the mw:API.php? I think that's what you are asking for, no? 75.212.71.60 07:48, 14 February 2009 (UTC)Reply
We have nothing yet that lets you look at Wiktionary data - though there are some quick heuristics that allow you to parse most entries (you can get the text from the API or the the XML dumps). Definition lines always start with a #, translations are in the approximate form *Language name: {{t|language code|translation}} or *Language name: [[translation]]. User:Polyglot has a parser that will get almost all translations from any Wiktionary entry. We've been thinking about making a web API out of it, but it needs a bit of time. What information are you interested in? Conrad.Irwin 12:14, 14 February 2009 (UTC)Reply