Wiktionary:Beer parlour/2011/September

From Wiktionary, the free dictionary
Jump to navigation Jump to search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


adding-translation script

Discussion on de.Wikt: de:Wiktionary:Teestube#.C3.9Cbersetzung-Hinzuf.C3.BCgen-Skript.

Hello English Wiktionary-Users,

in the German Wiktionary we would like to add the function that allows people to add translations without manually editing the code section. Could anyone explain how to do it? That would be great. Thanks in advance! Kampy 08:11, 11 September 2011 (UTC)[reply]

The German Wiktionary seems structure translations sections completely differently from the English Wiktionary, with translations showing which senses they correspond to by having numbers next to the translations, rather than putting the translations for each sense in a separate box, so simply copying the code wouldn't really work. --Yair rand 20:17, 11 September 2011 (UTC)[reply]
The code would have to be modified, yes, but that shouldn't be too complex a task. One option: the de.Wikt programmers could code another box between the ISO box and the translation box, which would take the sense number(s) as input. Is all of the code that operates the function contained in User:Conrad.Irwin/editor.js? - -sche (discuss) 05:48, 12 September 2011 (UTC)[reply]
No, it also uses the newNode function in MediaWiki:Common.js#Dom_creation. Another issue is that the script seems to make use of the translation table glosses, which dewikt doesn't have, for locating tables. (Not completely sure about that.) --Yair rand 06:08, 12 September 2011 (UTC)[reply]
I think if we (on de.Wikt) changed
(values.qual? '{'+'{qualifier|' + values.qual + '}} ' : '') +
to
(values.qual? '[' + values.qual + '] ' : '') +
, we could use the code as-is (abgesehen von the problem of glosses, which we could add), with users adding the sense-numbers (1, 1–2) in the "qualifier" field. - -sche (discuss) 01:06, 13 September 2011 (UTC)[reply]
About the numbers I think it shouldnt be too much of a problem. We dont use headlines saying the definition again instead we use those numbers. So there will only be one box at all times. Anything added to this box just needs an additional input box for the number it relates to. Can anyone code this? Kampy 00:05, 14 September 2011 (UTC)[reply]
There will be more than one box (and there will be no numbers) once the translations are (per the vote) separated by sense, though...
I have copied the code to my de:Benutzer:-sche/common.js, and have copied the en.Wikt and de.Wikt translation tables into subpages of my userspace for testing (de:Benutzer:-sche/sw4), but even with classes and gloss-support added to the German translation tables (de:Benutzer:-sche/sw1c), I haven't got it to work yet. - -sche (discuss) 01:01, 14 September 2011 (UTC)[reply]
Oh, I didn't know a vote was going on. I agree that the English version is more practical. I will support a change. Kampy 10:57, 14 September 2011 (UTC)[reply]
Is it possible that the code in my de.Wikt .js isn't considering itself enabled, Yair rand? - -sche (discuss) 01:08, 14 September 2011 (UTC)[reply]
[[de:Benutzer:-sche/common.js]] has some syntax errors that will cause browsers to stop processing it. [[de:Benutzer:Ruakh/common.js]] fixes the most severe errors — you can take it as a starting point for further debugging — but it still doesn't create the form for adding translations, so there's still something wrong. :-/   —RuakhTALK 02:30, 14 September 2011 (UTC)[reply]
Thank you for catching that! I'm guessing that (among other things) I should add \ to all of the other instances of sche/, like you did to sche/sw1c (ie sche/sw2bsche\/sw2b etc), yes? Or not to all of them? - -sche (discuss) 03:05, 14 September 2011 (UTC)[reply]
Doesn't make a difference, it's only necessary inside the regexps, not simple strings (/.../, not "..."). --Yair rand 03:22, 14 September 2011 (UTC)[reply]
It seems that makes the script work! The "±" sign displays atop the gloss, but that's a relatively minor problem. - -sche (discuss) 03:09, 14 September 2011 (UTC)[reply]
That's because the dewikt tables don't have the show/hide button as the first node in the NavHead, and the script places the "±" after the first node. Can be fixed by replacing insertDiv.insertBefore(edit_button, insertDiv.firstChild.nextSibling); with insertDiv.insertBefore(edit_button, insertDiv.firstChild);, so that it's placed before the first node.--Yair rand 03:22, 14 September 2011 (UTC)[reply]
Other issues: The dewikt language templates leave parserfunction residue when substed. This could be fixed by modifying the language templates to have {{{|safesubst:}}} before the #if: ({{ {{{|safesubst:}}}#if:{{{nolink|}}}|Französisch|[[Französisch]]}}). Also, the use of {{t}} needs to be replaced with whatever template dewikt uses. --Yair rand 03:32, 14 September 2011 (UTC)[reply]
Thanks; that change puts the "±" in the right place! :) I'm working on replacing {{t}} with the de.Wikt counterpart {{Ü}}. I am also considering that certain functions, like "Page name:", may not be applicable to de.Wikt. (In fact, I replaced the code to input qualifiers with code to input sense numbers; it may be that I should undo that and instead use the AFAICT-unneeded-on-de.Wikt pagename-with-diacritics code as the vessel for adding sense numbers.) (No, that wouldn't work at all.) - -sche (discuss) 04:01, 14 September 2011 (UTC)[reply]
Re safesubst: actually, the necessary change isn't to the templates (although having templates that subst safely is probably a good idea); the necessary change is to the code: we don't use use "Französisch" in translation tables on de.Wikt, we use {{fr}}. - -sche (discuss) 04:35, 14 September 2011 (UTC)[reply]
I've changed the code so that it does not subst language codes. However, changing {{t}} to {{Ü}} caused the function to display "Could not find translation entry for 'pt:worde'. Please reformat" when I tried to add worde (with ISO code pt given) to a section containing other translations. However, it added correctly to an otherwise empty section. I thought residual "{{t"s or a "{{Üxx" I added might be confusing the script's sorting mechanism, but it was also confused by this version of the page. (That version also shows that I/we/de.Wikt-programmers need to change how/where gender information is added.) - -sche (discuss) 05:03, 14 September 2011 (UTC)[reply]
The function getEditFunction might be the problem. It's built to look through the translation table wikitext for the translation to insert the new translation before (I think), but it's searching by first looking for * [[langname]]:, then for * langname:, and then for {{subst:langcode}}:, in case it's a newly added translation, but dewikt doesn't format translations like any of these. --Yair rand 21:21, 14 September 2011 (UTC)[reply]
You're right; removing subst: (so that it only looks for the language code) makes it work. Now to remove cruft... - -sche (discuss) 21:55, 14 September 2011 (UTC)[reply]
I have adapted the code to work with de.Wikt's Ü-templates. It even nests nb and nn correctly, when neither no nor nb nor nn is already in the table. However, de:Benutzer:Yoursmile gave me feedback that the adder appears but doesn't work (Could not find translation table for 'fr:reg'. Glosses should be unique) when other scripts are around, e.g. de:Benutzer:Yair rand/TabbedLanguages.js. I thought that might be because DOM-node code is redundantly in both codes, but when I separated the DOM-node and translations codes, and imported both, I found that the trans-adder no longer appeared. (If it had appeared, I would have imported only the trans-code and TabbedLanguages, to see if the absence of redundant DOM-code allowed the two to work together.) Any idea why splitting the two seemingly discrete scripts causes them to cease funktioning (ie causes the trans-adder to cease appearing)? Any idea why the trans-adder appears but does not work when TabbedLanguages are around?
Separate issue: any idea what I did wrong when I tried to remove the "script" bit? That edit caused the adder to cease appearing. That edit also removed a bit of "gender"-code, but that wasn't problematic; I successfully removed it later. I rendered the "script" bit harmless (we don't use script templates on de.Wikt) by causing it to input nothing and removing the interface, but that leaves a lot of cruft. - -sche (discuss) 23:56, 17 September 2011 (UTC)[reply]
The top four lines of that edit are actually removing part of something completely unrelated to the "script" bit, but that part isn't what actually broke it. The edit contained an extra comma (ota:{,wsc:"ota-Arab"}) which caused a syntax error. --Yair rand 07:32, 19 September 2011 (UTC)[reply]
Thank you! I have got the script to work with Tabbed Languages; with your syntax-fix, now the unneeded "script" part has been successfully removed. I tested Tabbed Languages and the Trans-Adder together in the main namespace on de:Katze. I moved the code to de:Benutzer:-sche/uebersetzung.js, if anyone wants to see for themselves (remember that at the moment it is still oriented to {{Benutzer:-sche/sw1c}} and therefore only works on test pages or modified pages). The only issue I note now is that it wouldn't add more than one translation without me refreshing or navigating away from and back to the page; I wondered if I just didn't wait long enough (de:Katze had a lot of translations for it to sort through), but it displayed the same behaviour on my simple test page. (A minor problem I remind myself to fix is the unneeded space between * {{langcode}}.) de.Wikt will have to adopt glosses for this to work. - -sche (discuss) 09:27, 19 September 2011 (UTC)[reply]
Of note: the code works differently in different browsers and in the main vs the user namespace. (Those interested can temporarily restore this version of Katze and try using the code on it.) - -sche (discuss) 22:54, 19 September 2011 (UTC)[reply]


Idiomatic translations

I've been wondering for a while how to add translations that are not strictly idiomatic in English or in the target language, but for which the translation itself is idiomatic and not obvious. An example I came across was 'I have a nosebleed', which is translated more or less word for word into Dutch as 'Ik heb een bloedneus', but in Catalan it is translated as 'Em sagna el nas' - literally 'The (my) nose bleeds to me'. Any translations given for nosebleed are only useful for Dutch, but they would not cover the Catalan case at all. The literal translation of 'nosebleed' in Catalan is 'hemorragia nasal', which is not helpful in this case, and is not even idiomatic itself so it can't be included. Cases like this are quite common between languages, and it seems like a rather big gap in Wiktionary to leave it out... —CodeCat 13:04, 1 September 2011 (UTC)[reply]

People keep talking about a phrasebook, maybe this would be a good use for it. Fugyoo 14:02, 1 September 2011 (UTC)[reply]
But there should be something directly under [[nosebleed]] too. In a paper English-Catalan dictionary you would expect to see something like "nosebleed - hemorragia nasal. I have a ~ : Em sagna el nas". So why not do something like that here: when an English term is best translated with a phrase in the target language, we give the phrase in addition to the straightforward noun=noun translation. —Angr 15:40, 1 September 2011 (UTC)[reply]
I would agree with this way but there is some overlap with entries that are idiomatic, which have their own entries. We might end up with a situation where the entry give contains translations for 'give up', while give up has its own translations as well. We would need to be careful that translations are not duplicated like this. —CodeCat 15:52, 1 September 2011 (UTC)[reply]
Not addressing your question, which is general, but, rather, only the specific example: Do other symptoms not translate into Catalan similarly? Is "I have a headache" in Catalan not literally "The head hurts to me"? If so (and, not knowing any Catalan, I have no idea whether it's so), then I don't think we should include such translations in any entry at all: they belong in a grammar, perhaps, but are not relevant to any one word of the language.​—msh210 (talk) 15:47, 1 September 2011 (UTC)[reply]
We already have some grammar in Wiktionary's entries, and I don't think it's much of a problem if we include things like this. They are very useful to someone who wants to say 'I have a nosebleed' in Catalan and looks at the translation table, and then notices immediately that what he wants to say is said differently. It's very user friendly that way. —CodeCat 15:52, 1 September 2011 (UTC)[reply]
On a tangential note, google:"I have a nosebleed" seems much less common than google:"my nose bleeds" and google:"my nose is bleeding". "my nose bleeds" seems like a fairly good candidate for a phrasebook entry, one that can be linked to from "nosebleed". --Dan Polansky 16:05, 1 September 2011 (UTC)[reply]
'My nose bleeds' seems very awkward to me. It sounds like you are saying it bleeds habitually rather than that it is bleeding right now. —CodeCat 16:06, 1 September 2011 (UTC)[reply]
Where I live in Northern England, people would say "my nose is bleeding" or possibly "I have a nosebleed" but never "my nose bleeds". I imagine most of the ghits for "my nose bleeds" would be part of phrases such as "my nose bleeds when..." or such like. BigDom 16:10, 1 September 2011 (UTC)[reply]
google:"My head hurts" and google:"my stomach hurts" also seem very common inspite of not using the present continuous tense. Also check the two phrases in Google books to see how very common they are also there. --Dan Polansky 16:12, 1 September 2011 (UTC)[reply]
This American agrees with Codecat and BigDom: my nose bleeds sounds like it does so habitually, not now. My X hurts OTOH means now. Go figure.​—msh210 (talk) 16:28, 1 September 2011 (UTC)[reply]
I agree. But to me it seems similar to the Americanism "Do you have" instead of "Have you got" (I was once asked "Do you have children?" I replied "Not very often." and she was very confused!) SemperBlotto 06:59, 2 September 2011 (UTC)[reply]
@ CodeCat, we often just split the link, [[hemorragia]] [[nasal]]. Mglovesfun (talk) 07:08, 2 September 2011 (UTC)[reply]

More generally, the translation table should include help when needed. Lmaltier 17:03, 2 September 2011 (UTC)[reply]

Question

I was directed here from a discussion section. So what is the "acceptability" of signatures? An editor since 8.28.2011. 06:36, 3 September 2011 (UTC)[reply]

Any signature is probably going to be acceptable unless someone takes exception to it. If somebody has a problem with it, they will explain and then you will know how to improve its acceptability. Or you can ignore the complaint and advice and choose instead to burn your bridges with that editor. If you burn too many bridges, you may find it difficult or impossible to function effectively here. —Stephen (Talk) 06:44, 3 September 2011 (UTC)[reply]
I have no clue how that is related to my comment? (Never mind.) Okay... what exactly are you trying to say? Oh, I see! I'm stupid when I'm tired. Thanks! An editor since 8.28.2011. 06:48, 3 September 2011 (UTC)[reply]
FWIW I find colorful signatures annoying... but I'd rather be annoyed than limit others' freedom to editor their own signature, unless the signature is really really silly. Mglovesfun (talk) 09:55, 3 September 2011 (UTC)[reply]
Thank you! An editor since 8.28.2011. 17:08, 3 September 2011 (UTC)[reply]
However: my signature is uni-colored. An editor since 8.28.2011. 17:11, 3 September 2011 (UTC)[reply]
Colorful doesn't necessarily imply more than one color. --Mglovesfun (talk) 14:18, 7 September 2011 (UTC)[reply]
Differently-coloured signatures might cause problems for people using different skins (colour schemes), perhaps because of poor eyesight. Fugyoo 14:26, 7 September 2011 (UTC)[reply]

Just want to ask for a few volunteers to fix these entries, using templates such as {{en-noun}}, {{en-verb}} or just {{infl}}. No obligation of course, but even fixing one entry at this late stage is a help. Thank you, Mglovesfun (talk) 09:54, 3 September 2011 (UTC)[reply]

community's opinion on bot format

I received this message on my talk page: Hi there. It is a bit late now, but I have been meaning to ask you for some time if the form P.officer was a mistake. Also, in your subpages, we like to use {{conjugation of}} these days rather than {{form of}} e.g. {{conjugation of|pellettizzare||2|s|past historic|lang=it}} (Italian example).

The thing that I'm asking about is when it says "we like to use {{conjugation of}}...rather than {{form of}}". Is it important which template to use, if both produce identical results? To me, it looks unnecessary to change to {{conjugation of}}, if not a waste of time, but I'm eager to hear the voices of other users. --Pofficer 17:53, 4 September 2011 (UTC)[reply]

If they produce the same thing, I don't see the point in switching.​—msh210 (talk) 20:22, 4 September 2011 (UTC)[reply]
Conjugation of is more uniform, there are many minor variation on how to write "first-person singular present indicative" using form of, while conjugation of only allows one of these. Mglovesfun (talk) 21:34, 4 September 2011 (UTC)[reply]
I agree. Hence my "If..." clause.​—msh210 (talk) 21:37, 4 September 2011 (UTC)[reply]
OK, I shall continue the bot. If there are any problems, don't hesitate to leave me a message and I'll put a clamp on the bot. --Pofficer 09:56, 5 September 2011 (UTC)[reply]

I got a message just now about a bot flag, the "small formality of requesting permission to run as a bot, and then getting a sysop to set the bot-flag on your user id". Can I request permission to run P.officer (talkcontribs) as a bot? Instead, perhaps, I could change the name of the bot to Officebot (talkcontribs) as it could avoid confusion. --Pofficer 10:20, 5 September 2011 (UTC)[reply]

  • We do have a couple of bots without -bot or -Bot in the name but, if you don't really mind, I'll change it to PofficerBot before setting the bot flag (It seems to be functioning OK). SemperBlotto 10:31, 5 September 2011 (UTC) p.s. You would need to edit your user-config.py file to reflect the name change.[reply]

Calling all 日本語能力のある方...

Following comments in various other threads, it appears that the WT:AJA page needs some work. The issues I'm immediately aware of:

  • Quasi-adjectives (な adjectives): WT:AJA insists on including the な in the headword, which does not appear to be the current consensus.
  • の adjectives: WT:AJA does not include any clear guidelines for these. (Relatedly, {{ja-adj}} doesn't include any way of handling these either.)
  • Suru compound verbs: WT:AJA calls for using the {{ja-suru}} template. However, する is a standalone verb, so including the する conjugation on each and every compound verb page seems excessive.
  • {{ja-kanjitab}}: WT:AJA describes including this under an === Etymology === section if there is one, but including under the main == Japanese == section produces largely identical results, unless there are multiple etymology sections, in which case repeating the kanjitab seems excessive.
  • The Transliteration subpage could also use some work, particularly with regard to spacing and what constitutes a single word in Japanese (i.e., particles should be separate, suru should be separate, etc. etc.).
  • 連体詞: WT:AJA states that this should be given a POS of "prefix", but that is really not what these words are -- a prefix is part of a word, whereas 連体詞 are clearly standalone words. They are less prefixes and more like true adjectives, in that they must precede a noun.
  • Single-kanji entries: WT:AJA has no clear instructions on how to specify okurigana in kun'yomi listings, nor any clear instructions on how to format these to link to verb forms. For instance, shows one way of clarifying okurigana and linking to kanji+okurigana entries, but is a bit visually messy; ja:食#日本語 looks a bit cleaner with the use of hyphens to show the break between the kanji and the okurigana, and this roughly matches the format I've most often seen in dead-tree dictionaries, but the entry doesn't link to any kanji+okurigana entries, just to the hiragana entries; and doesn't show okurigana or link to any kanji+okurigana entries.

This post is really just meant to get the ball rolling. Many of these changes listed above are a departure from what WT:AJA currently says, so I'm hoping to spark a bit of discussion before making any edits. -- TIA, Eiríkr Útlendi | Tala við mig 17:41, 6 September 2011 (UTC)[reply]

Please keep discussion in the fora here in English where possible. For the record, 日本語能力のある方 seems to mean "those skilled in Japanese" or similar (based only on Google Translate, not that I know any Japanese, myself).​—msh210 (talk) 18:25, 6 September 2011 (UTC)[reply]
Also, you might want to continue this discussion at Wiktionary talk:About Japanese, since it may wind up taking up a lot of screen space and is specific to Japanese (and indeed the AJA page!).​—msh210 (talk) 18:28, 6 September 2011 (UTC)[reply]
Fair enough. I've tried posting there a few times and got the overwhelming impression of crickets chirping, which led me to try posting on a more-trafficked page. I'll copy this thread over to there shortly. -- Eiríkr Útlendi | Tala við mig 19:16, 6 September 2011 (UTC)[reply]
I think it is a good idea to have this post here, directing everyone to the Wiktionary talk:About Japanese page (where the discussion can take place). If no-one adds to the discussion, contact other active editors of Japanese directly on their talk pages. If there are none, or you have done that and they have not replied, then you (as the only active editor of the language) should make whatever changes you deem necessary. - -sche (discuss) 20:06, 6 September 2011 (UTC)[reply]
I agree: keep this here, but continue discussion there. (That's what I meant in the first place: sorry I wasn't clear.)​—msh210 (talk) 23:41, 6 September 2011 (UTC)[reply]
Sure, no worries.  :) I copied my initial post over to Wiktionary_talk:About_Japanese#Work_Needed. I hope to get into the nitty gritty over there. -- Cheers, Eiríkr Útlendi | Tala við mig 23:45, 6 September 2011 (UTC)[reply]

I've created a list of the 1000 most common species epithets

Hi Latin lovers and barflies,

User:Pengo/Latin/Top_1000

Based on the Encyclopedia of Life database, I've compiled a list of the most common species epithets. I'm hoping this will help those who want to create new Latin/Translingual entries.

There's more details on the page. --Pengo 14:14, 7 September 2011 (UTC)[reply]

Here's the top 5 words that are missing Latin/Translingual entries:

--Pengo 03:22, 8 September 2011 (UTC)[reply]

In many cases it is just the inflected form that is missing, eg, nana, nanus (a dwarf), but in some cases lemmata are missing, even classical ones, eg, variegatus, variego. DCDuring TALK 14:21, 8 September 2011 (UTC)[reply]
Looks like it would help if I grouped words with the same stem. I'm going to attempt to make another list that does that (at least crudely).
I, myself, don't know an inflection from a declension, so until I learn some Latin grammar and work out all the templates and formatting here, this list is really for you and other editors. So let me know if there's anything else that would be useful. --Pengo 02:53, 9 September 2011 (UTC)[reply]
Grouping by stems is less helpful IMO for speeding entry creation than grouping by inflectional ending and suffix. Ie, the forms ending in "ata" have a very similar Latin section structure. That structure will have links to the participle lemma ending in "atus", which will have links to the lemma verb. Some of those links may be red. The entries for the red link lemmas should probably be added by an editor familiar with Latin with access to multiple Latin references, including some for Medieval Latin. Purely New Latin terms are much less interesting to most Latinists, however important they may be to taxonomists and to Wiktionary. DCDuring TALK 12:25, 9 September 2011 (UTC)[reply]
Thanks for the feedback. Working on it. Will add some extra features too. --Pengo 04:56, 10 September 2011 (UTC)[reply]

As of recently, we have an abuse filter. It allows us to create rules against which edits (and moves and other things) are filtered; if an edit matches such a rule, it can — at our option for each rule — tag the edit with a little note in special:recentchanges, not allow the edit to go through until the editor first sees a warning that the edit might not be wise (which warning can be customized for each filter rule), block the edit altogether, or remove the editor's "autoconfirmed" flag. (Or combinations of those.) It can also do these things only after the editor in question makes too many rule-matching edits in a short period of time (which rate, too, is customizable per filter rule). For more on the abuse filter, see the MediaWiki extension page and/or the Wikipedia abuse filter page (except that they call it the "edit filter").

I've set up some rules that I thought would be helpful.

One of them actually blocks an edit from going through: this filter checks that the user is not an autopatroller, admin, or bot; that the edit is in the main (entry) namespace; that the entry had a level-three header before the edit; that the edit had no level-three header after the edit; and that entry (after the edit) doesn't have a speedy-deletion template or {{only in}}. It blocks that edit from going through. That filter has (in its current incarnation) caught scores of edits, with no false positives (i.e., it not block any edit that we wouldn't have manually rolled back had it gone through).

No other rule currently does more than tag an entry on special:recentchanges. I propose, though, that three do.

One of them is a copy of a filter at enWP. These filters look for an edit that adds a single bad word and nothing else. (Approximately. The actual workings of the filter are hidden on enWP, so I've hidden our copy also. Admins and "edit filter managers" over there can see their copy, and our admins can see ours.) On enWP, it prevents the edit from going through, and has done so for months. (I don't, however, know how fastidious they are in looking for false positives.) Here, it does nothing; so far we've had only a handful of matches, with no false positives. I propose it prevent edits from going through here also. I also ask admins to edit it to enwikt purposes (testing it well of course, especially if it disallows the edit from going through).

Update: Now we've had a false positive.​—msh210 (talk) 15:27, 8 September 2011 (UTC)[reply]

Another rule I think should do more than tag is one that checks whether a new (main namespace) entry is created by a non-autopatroller (non-admin, non-bot), lacks a level-three header, and either {has both a capital letter and a space in its title} or {has a right-parenthesis ) at the end of its title}. It's only had a handful of hits, with no false positives. Again, please improve it; and I think perhaps it should also block edits from going through.

The third rule I think should prevent edits from going through currently also just tags. It checks whether an entry is not new, is being edited by a non-autopatroller (non-bot, non-admin), and has its after-this-edit text the same (but for capitalization and other normalizations) as its pagetitle.

Thoughts?

(Of course, edits to improve the other filters are sought, too. And new ones.)​—msh210 (talk) 20:05, 7 September 2011 (UTC)[reply]

This is a really neat tool and I applaud your initiative in creating a few filters to start out. - [The]DaveRoss 20:08, 7 September 2011 (UTC)[reply]
Yeah, it's an excellent thing to have. I think I saw a rule to block edits that create a page whose content is identical to its title, which (for some reason) is a very common useless edit. Equinox 22:58, 7 September 2011 (UTC)[reply]
We have it for existing pages: it checks whether the page content was reduced to its pagetitle. We could easily have it for new pages also (even by editing the existing filter rule).​—msh210 (talk) 15:19, 8 September 2011 (UTC)[reply]
I've updated that rule. We can watch and see if it picks up false positives.​—msh210 (talk) 15:30, 8 September 2011 (UTC)[reply]
I think we could disallow creating pages in the main namespace if the first character is a letter. All existing pages begin with either a header or with a template like {{also}} or {{wikipedia}}. —CodeCat 23:26, 7 September 2011 (UTC)[reply]
(You mean if the first char is alphanumeric?) Most people don't come here knowing the formatting rules, so if we did do that, we would need extra-prominent links to those and to places they might want, like WT:REE. Equinox 23:30, 7 September 2011 (UTC)[reply]
I think we should tag those but not disallow 'em. There might be some usable content. Mglovesfun (talk) 11:45, 8 September 2011 (UTC)[reply]
Yeah most people don't know ELE, so we shouldn't disallow them, but I think it might be wise to give the editors a notice before allowing them to save, which notice can outline the format, or something. (And tag the edit.)​—msh210 (talk) 15:19, 8 September 2011 (UTC)[reply]
I've created a filter along these lines. It checks whether the first character is anything but { or =. It does nothing for now (so we can check for false positives), but can warn the user.​—msh210 (talk) 18:07, 18 September 2011 (UTC)[reply]
Awesome! —RuakhTALK 00:29, 8 September 2011 (UTC)[reply]
Could we write a filter that shows editors a warning before allowing them to put their edit through, if their edit introduces <ref> (and does not introduce <references/>) to an entry that does not contain <references/>? I sometimes forget, on both en. and de.Wikt , to add <references/> when adding <ref>s. The warning would remind the editors to add the <references/> tag. - -sche (discuss) 18:50, 8 September 2011 (UTC)[reply]
I've created it but have not yet tested it (or checked how expensive it is).​—msh210 (talk) 21:34, 8 September 2011 (UTC)[reply]
It works well, as far as I can tell, and has caught a couple of users. I think we need to update the location of the message, though (either move MediaWiki:Abusefilter-warning/ref-no-references back to MediaWiki:Abusefilter-warning/ref-no-reference or change the link, whichever is easier; at the moment it displays a default message rather than the nicer and more informative custom one). - -sche (discuss) 05:53, 10 September 2011 (UTC)[reply]
I've fixed it, I think (not just now).​—msh210 (talk) 17:30, 11 September 2011 (UTC)[reply]

So (to repeat myself) we have a filter rule that catches edits that result in a page whose content matches its title (in the main namespace, and except for whitelisted folks, admins, and bots). Any objection to having that rule block the edit from going through? As of now we've had only about ten hits, but no false positives, and I can't think how there would be any.​—msh210 (talk) 17:30, 11 September 2011 (UTC)[reply]

Done.​—msh210 (talk) 15:18, 13 September 2011 (UTC)[reply]
A lot of anon users seem to create pages that just contain one or more instance of the text "[[File:Example.jpg]]" (perhaps they are accidentally clicking the delayed-loading JavaScript toolbar?). A filter for this might be worthwhile. Equinox 13:02, 17 September 2011 (UTC)[reply]
Alternatively, we could push to have the toolbar fixed. ;-)   Personally I have it turned off, because it's just too annoying to try to click in the textarea and suddenly have inserted something random. —RuakhTALK 14:49, 17 September 2011 (UTC)[reply]
I would like there to be a fixed-sized empty space on the page until the toolbar loads and replaces it. Whom do we nag? Equinox 14:54, 17 September 2011 (UTC)[reply]

Lemma entries for Japanese na type adjectives (形容動詞)

I've noticed that a the policy for な-type adjectives or keiyodoshi is to include the な as a part of the entry. This is not, as far as I know, standard practice in any Japanese dictionary or even the Japanese Wiktionary.

For example, both 元気 and 元気な are treated as lemma entries. I believe users would be better served to have the 元気な entry read: "Attributive (連体形) form of 元気", and have both noun and adjective lemma entries listed on 元気.

The -な suffix is merely a conjugation of form and should be treated as such. The most egregious example, and the one that brought this issue to my attention, is たくさんな. There is a page for the kanji version of this word, 沢山, but there isn't even a link to it from たくさんな, instead there is a broken link to 沢山な. But all of this is besides the point, the real issue is that たくさんな is a much less often used form than either たくさんの or even just たくさん. All of these forms would be better served by the lemma entry たくさん, which I would be happy to write tonight after work, but that doesn't solve the system wide problem of な-type adjectives being written with the な as part of the lemma.

The only policy on this I can find, Wiktionary:About Japanese#Quasi-adjectives_.28.E5.BD.A2.E5.AE.B9.E5.8B.95.E8.A9.9E.29, is not very clear on the issue. I propose that it be changed to include the ideas I've put forth, but I'm not sure exactly how to do so. Entries would still need to acknowledge that these are な-type adjectives, but this could easily be done in a header or something, right?

Also, perhaps a bot of some sort to change all of the entries made in the way I clearly find so offensive. *^_^*

MichaelLau 19:04, 9 September 2011 (UTC)[reply]

Hello Michael, thanks for chiming in --
Those of us dealing with Japanese here on the English Wiktionary have been chewing on some of these issues recently, c.f. WT:BP#Preferred forms for Japanese lemmata, WT:BP#WT:About_Japanese, and a number of posts starting at Wiktionary_talk:About_Japanese#Lemma forms for keiyōdōshi and continuing further down that page. The emerging consensus is in largely line with what you describe. I'd really appreciate it if you could have a look at the other posts I've linked to here to get up to speed with what has already been discussed of late, and then it'd be great if you'd add to the discussion over at Wiktionary_talk:About_Japanese#Work_Needed. -- Cheers, Eiríkr Útlendi | Tala við mig 20:00, 9 September 2011 (UTC)[reply]

I'd like to update this template to handle shinjitai / kyūjitai, much as the Japanese POS templates already do (see {{ja-noun}}, {{ja-adj}}, {{ja-verb}}, etc.).

Some kanji don't get used as words on their own, and thus the individual kanji entry won't have anywhere graceful to put shinjitai / kyūjitai information. It would seem most appropriate for that information to go in the {{ja-kanji}} template itself, rather than (or possibly as well as - removing would take work) in the POS templates.

Are there any admins who could either implement this change, or change the protection level of {{ja-kanji}} to allow me to do so? -- Eiríkr Útlendi | Tala við mig 20:40, 9 September 2011 (UTC)[reply]

Looked at this again and realized I can indeed edit the template, so I did. I'll update the template documentation later to account for the new args. -- Eiríkr Útlendi | Tala við mig 21:51, 20 September 2011 (UTC)[reply]

Classical/Literary Chinese entries

Is there a correct way to add a definition of a Classical or Literary Chinese word? I've seen information about noting an etymology, but I'm not talking about an etymology for a modern word, I'm talking about defining a word as used in Classical Chinese texts. Such an entry might have the same meaning in modern Chinese, or might have a different meaning, or might not be used at all any more. I've looked for a list of "official" wiktionary languages, and found the "random entry" list. It has Old Chinese, Middle Chinese, and Late Middle Chinese, which are names of reconstructed languages (mostly phonology) from different periods. Those were spoken languages, and Classical Chinese was the most common written language used during all of those periods. How about Early Vernacular Chinese, for example words used in the novels 红楼梦 or 金瓶梅? There are entire dictionaries devoted to this language, but is the distinction appropriate on wiktionary? If so, can I just enter Early Vernacular Chinese as the language? Craig Baker 06:26, 11 September 2011 (UTC)[reply]

Such distinctions can be a bit arbitrary, I edit Old French, Middle French and French so I'm familiar with the issue. Important note one, please don't remove Mandarin headers. Mandarin is standard here, it's also a widely accepted language name. Like you say, depending on date it could be Old Chinese, Middle Chinese, and Late Middle Chinese. There's no reason not to create an ad hoc code for Classical Chinese if editors want it. But only if editors want it. For example we have 'ad hoc' codes {{roa-jer}} for Jèrriais and {{roa-leo}} for Leonese. Mglovesfun (talk) 13:38, 11 September 2011 (UTC)[reply]
There is already a code {{lzh}} for Literary Chinese. —CodeCat 13:53, 11 September 2011 (UTC)[reply]
Right then, in which case definition don't replace Mandarin with Literary Chinese, as Mandarin is a language. We don't replace English with Middle English, we include both when the word/term is used in both languages. Mglovesfun (talk) 14:00, 11 September 2011 (UTC)[reply]
Please see here for your references. Engirst 15:16, 11 September 2011 (UTC)[reply]
Regarding "replacing" Mandarin, what about in entries I've added where the words are not found in Mandarin? Or, where I have no evidence that the word is found in Mandarin? Is there an expectation that when I add a Literary Chinese definition, I will also research whether the word is found in Mandarin? Craig Baker 15:52, 11 September 2011 (UTC)[reply]
Could we enter Classical Chinese like this? Engirst 16:25, 11 September 2011 (UTC)[reply]
The transliteration in this entry is based on Mandarin, the way Classical Chinese is taught in China. There really can't be another way, as they teach the words, grammar, sentence structure but not the pronunciation. So, in short it's not 100% accurate. --Anatoli 00:09, 12 September 2011 (UTC)[reply]
Speedy deletion is only for patently wrong entries. Unlike Wikipedia, deleting one language section of an entry with more than one language section would be considered a speedy deletion, about equivalent to blanking a whole Wikipedia entry. You should likely be going to WT:RFV with these, though unless there's a pretty robust answer to the question 'what's the difference between Classical Chinese and Mandarin?' then a lot of these debates will be a waste of time. Mglovesfun (talk) 19:41, 11 September 2011 (UTC)[reply]
From what I understand from Wikipedia, Literary Chinese is an obsolete writing standard based on the Middle Chinese spoken language that was used up till the early 20th century. It would be comparable to Ottoman Turkish, but being in use a lot longer. —CodeCat 21:45, 11 September 2011 (UTC)[reply]
Mglovesfun, just to be clear, my change of two entries to "Classical Chinese" which you reverted were new entries which I added earlier that day, and initially categorized them as Mandarin because I didn't know that Literary Chinese was an option. I otherwise wouldn't have considered changing the language of an existing entry, which is what I assume you mean by "speedy deletion". What I'm more curious about is new entries for which I can provide Classical Chinese definitions, but don't have any information about Mandarin or other modern varieties. Craig Baker 03:26, 12 September 2011 (UTC)[reply]
I don't think we have contributors in Classical Chinese and in my opinion, we don't need to split Mandarin and Classical Chinese if a specific pronunciation for a specific period is not chosen. Also, The way Classical Chinese is used in Modern Mandarin, Cantonese, etc, the words can be classified as simply Mandarin, Cantonese, etc with some {{qualifier}}. The reason is that, they are borrowed into modern Chinese varieties and adjusted to the appropriate pronunciation, used in quotes quite often. The few words that are NEVER or SELDOM used in modern languages, like classical pronouns, prepositions, have a modern usage, anyway, e.g. (), (), (zhī), etc. and the modern pronunciation. Numerous Mandarin chengyu are an example how Classical Chinese is used in modern Mandarin. To understand their meaning, some knowledge of the Classical Chinese grammar and vocabulary is required but I don't think their components should have a separate entry as Classical Chinese. In any case, hanzi as such a complicated component, which is hard to classify as a part of speech, they often convey a meaning and only in combination become nouns, verbs, etc. --Anatoli 00:09, 12 September 2011 (UTC)[reply]
The {{ qualifier}} idea sounds ok to me. As long as there is a way to note that they are Classical Chinese words, the information will not be lost, and it will be possible to use the dictionary when reading Classical Chinese texts for example. I'm curious why choosing a pronunciation is related to splitting the languages; pronunciations are not necessary to write a dictionary, though maybe some technical limitation of Wiktionary requires it? To me, the written form seems most important in a language like Classical Chinese where the pronunciation was not really even recorded, although I do think reconstructions can be interesting and useful in some ways. I agree that a good number of Classical Chinese words are Mandarin words too, but in general I don't agree with your example of chengyu; in most cases I think chengyu should be considered to be a single word in Mandarin (etc.), but just an ordinary phrase or sentence in Classical Chinese. In such chengyu, what used to be Classical Chinese words are no longer free to act like words in Mandarin sentences, and the meaning of the chengyu has fossilized and often shifted. In the terms used on the "Criteria for inclusion" page, the chengyu is idiomatic in Mandarin, but not in Classical Chinese; and the words it is composed of are not attested in Mandarin outside of that chengyu. In the end, I suppose the "language status" is not very important to me, as long as the two can be separated in some way by the reader or perhaps by an automatic script for the reader's use, so that the dictionary is useful for reading both Classical Chinese and Mandarin texts. Craig Baker 03:26, 12 September 2011 (UTC)[reply]
I only said that chengyu in Modern Mandarin demonstrate the grammar and syntax of Classical Chinese, didn't say that one can use its components as they were then.
As a dictionary, Wiktionary deals less with stylistics and syntax, it would be really hard to define each hanzi for both modern Mandarin and Classical Chinese. 文言文 (Wényánwén) (Classical Chinese), unlike 白話白话 (Báihuà) (Vernacular Chinese) was almost 100% monosyllabic, each word consisting of only one hanzi, and defining the classical sense and usage of hanzi would require major work on these entries. At the moment, most definitions for hanzi are under the Han character heading. The specific CJKV language sections mainly deal with the READINGS of those characters. --Anatoli 03:47, 12 September 2011 (UTC)[reply]
I see your point about definitions for single characters currently being under the "Translingual" section. Of course it would require major work, but it's hard for the work to even begin without a language category, or to attract anyone capable of doing the work. I notice that many (most?) single-character entries already have definitions in the Japanese section, as well as etymologies (while the Translingual section has just a character etymology, not a word etymology). I would assume that the eventual goal is for definitions to be provided in the other languages/dialects too, so that we have information about how the word is used in those languages (or how it is not used—one of the most difficult things about reading Classical Chinese with a dictionary that includes both modern and ancient definitions is filtering out the modern definitions). I will continue reading around the Community Portal to try to understand the plan for this. Perhaps it would also help to note that there are already many large, good dictionaries devoted to just Classical Chinese, so they are definitely useful. Craig Baker 03:08, 14 September 2011 (UTC)[reply]
Sorry to have not seen this sooner. I have created thousands of classical chinese words on wiktionary over the last several years. I have created a number of translations at wikisource that link back words to wiktionary definitions. My long term project is s:Romance of the Three Kingdoms. So far, the format has been largely decided by me, since I haven't come across anyone knowledgeable in the subject that wanted to contributed entries. My approach has been to view the problem through the lens of Mandarin. I'm not suggesting that this is the ideal approach, merely the most practical. Since Classical Chinese can be read in modern Mandarin, it made sense to create mandarin entries that used either the {{literary}} or {{archaic}} labels. The {{obsolete}} label might be another potential option, although I haven't used it all that much. These context labels recently underwent a minor change. They now put the words into categories called: Category:Mandarin archaic terms in traditional script, Category:Mandarin archaic terms in simplified script, Category:Mandarin literary terms in traditional script and Category:Mandarin literary terms in simplified script. These categories should gradually replace Category:zh-tw:Archaic, Category:zh-cn:Archaic, Category:zh-tw:Literary and Category:zh-cn:Literary as well as Category:Traditional Chinese archaic terms, Category:Traditional Chinese archaic terms, Category:Traditional Chinese literary terms and Category:Simplified Chinese literary terms. See 飲酒 and 征東將軍 for some typical examples of how I format entries. Also, I used as a model of how we could do it if time and people were not limitations. Thanks. -- A-cai 01:37, 29 September 2011 (UTC)[reply]
P.S. Other pieces that I've done in this way on wikisource: s:Departing from Baidi in the Morning, s:Preface to the Poems Composed at the Orchid Pavilion, s:Song of Everlasting Regret, s:The Peach Blossom Spring and s:Touring Shanxi Village -- A-cai 01:43, 29 September 2011 (UTC)[reply]

adding-translation script

Discussion on de.Wikt: de:Wiktionary:Teestube#.C3.9Cbersetzung-Hinzuf.C3.BCgen-Skript.

Hello English Wiktionary-Users,

in the German Wiktionary we would like to add the function that allows people to add translations without manually editing the code section. Could anyone explain how to do it? That would be great. Thanks in advance! Kampy 08:11, 11 September 2011 (UTC)[reply]

The German Wiktionary seems structure translations sections completely differently from the English Wiktionary, with translations showing which senses they correspond to by having numbers next to the translations, rather than putting the translations for each sense in a separate box, so simply copying the code wouldn't really work. --Yair rand 20:17, 11 September 2011 (UTC)[reply]
The code would have to be modified, yes, but that shouldn't be too complex a task. One option: the de.Wikt programmers could code another box between the ISO box and the translation box, which would take the sense number(s) as input. Is all of the code that operates the function contained in User:Conrad.Irwin/editor.js? - -sche (discuss) 05:48, 12 September 2011 (UTC)[reply]
No, it also uses the newNode function in MediaWiki:Common.js#Dom_creation. Another issue is that the script seems to make use of the translation table glosses, which dewikt doesn't have, for locating tables. (Not completely sure about that.) --Yair rand 06:08, 12 September 2011 (UTC)[reply]
I think if we (on de.Wikt) changed
(values.qual? '{'+'{qualifier|' + values.qual + '}} ' : '') +
to
(values.qual? '[' + values.qual + '] ' : '') +
, we could use the code as-is (abgesehen von the problem of glosses, which we could add), with users adding the sense-numbers (1, 1–2) in the "qualifier" field. - -sche (discuss) 01:06, 13 September 2011 (UTC)[reply]
About the numbers I think it shouldnt be too much of a problem. We dont use headlines saying the definition again instead we use those numbers. So there will only be one box at all times. Anything added to this box just needs an additional input box for the number it relates to. Can anyone code this? Kampy 00:05, 14 September 2011 (UTC)[reply]
There will be more than one box (and there will be no numbers) once the translations are (per the vote) separated by sense, though...
I have copied the code to my de:Benutzer:-sche/common.js, and have copied the en.Wikt and de.Wikt translation tables into subpages of my userspace for testing (de:Benutzer:-sche/sw4), but even with classes and gloss-support added to the German translation tables (de:Benutzer:-sche/sw1c), I haven't got it to work yet. - -sche (discuss) 01:01, 14 September 2011 (UTC)[reply]
Oh, I didn't know a vote was going on. I agree that the English version is more practical. I will support a change. Kampy 10:57, 14 September 2011 (UTC)[reply]
Is it possible that the code in my de.Wikt .js isn't considering itself enabled, Yair rand? - -sche (discuss) 01:08, 14 September 2011 (UTC)[reply]
[[de:Benutzer:-sche/common.js]] has some syntax errors that will cause browsers to stop processing it. [[de:Benutzer:Ruakh/common.js]] fixes the most severe errors — you can take it as a starting point for further debugging — but it still doesn't create the form for adding translations, so there's still something wrong. :-/   —RuakhTALK 02:30, 14 September 2011 (UTC)[reply]
Thank you for catching that! I'm guessing that (among other things) I should add \ to all of the other instances of sche/, like you did to sche/sw1c (ie sche/sw2bsche\/sw2b etc), yes? Or not to all of them? - -sche (discuss) 03:05, 14 September 2011 (UTC)[reply]
Doesn't make a difference, it's only necessary inside the regexps, not simple strings (/.../, not "..."). --Yair rand 03:22, 14 September 2011 (UTC)[reply]
It seems that makes the script work! The "±" sign displays atop the gloss, but that's a relatively minor problem. - -sche (discuss) 03:09, 14 September 2011 (UTC)[reply]
That's because the dewikt tables don't have the show/hide button as the first node in the NavHead, and the script places the "±" after the first node. Can be fixed by replacing insertDiv.insertBefore(edit_button, insertDiv.firstChild.nextSibling); with insertDiv.insertBefore(edit_button, insertDiv.firstChild);, so that it's placed before the first node.--Yair rand 03:22, 14 September 2011 (UTC)[reply]
Other issues: The dewikt language templates leave parserfunction residue when substed. This could be fixed by modifying the language templates to have {{{|safesubst:}}} before the #if: ({{ {{{|safesubst:}}}#if:{{{nolink|}}}|Französisch|[[Französisch]]}}). Also, the use of {{t}} needs to be replaced with whatever template dewikt uses. --Yair rand 03:32, 14 September 2011 (UTC)[reply]
Thanks; that change puts the "±" in the right place! :) I'm working on replacing {{t}} with the de.Wikt counterpart {{Ü}}. I am also considering that certain functions, like "Page name:", may not be applicable to de.Wikt. (In fact, I replaced the code to input qualifiers with code to input sense numbers; it may be that I should undo that and instead use the AFAICT-unneeded-on-de.Wikt pagename-with-diacritics code as the vessel for adding sense numbers.) (No, that wouldn't work at all.) - -sche (discuss) 04:01, 14 September 2011 (UTC)[reply]
Re safesubst: actually, the necessary change isn't to the templates (although having templates that subst safely is probably a good idea); the necessary change is to the code: we don't use use "Französisch" in translation tables on de.Wikt, we use {{fr}}. - -sche (discuss) 04:35, 14 September 2011 (UTC)[reply]
I've changed the code so that it does not subst language codes. However, changing {{t}} to {{Ü}} caused the function to display "Could not find translation entry for 'pt:worde'. Please reformat" when I tried to add worde (with ISO code pt given) to a section containing other translations. However, it added correctly to an otherwise empty section. I thought residual "{{t"s or a "{{Üxx" I added might be confusing the script's sorting mechanism, but it was also confused by this version of the page. (That version also shows that I/we/de.Wikt-programmers need to change how/where gender information is added.) - -sche (discuss) 05:03, 14 September 2011 (UTC)[reply]
The function getEditFunction might be the problem. It's built to look through the translation table wikitext for the translation to insert the new translation before (I think), but it's searching by first looking for * [[langname]]:, then for * langname:, and then for {{subst:langcode}}:, in case it's a newly added translation, but dewikt doesn't format translations like any of these. --Yair rand 21:21, 14 September 2011 (UTC)[reply]
You're right; removing subst: (so that it only looks for the language code) makes it work. Now to remove cruft... - -sche (discuss) 21:55, 14 September 2011 (UTC)[reply]
I have adapted the code to work with de.Wikt's Ü-templates. It even nests nb and nn correctly, when neither no nor nb nor nn is already in the table. However, de:Benutzer:Yoursmile gave me feedback that the adder appears but doesn't work (Could not find translation table for 'fr:reg'. Glosses should be unique) when other scripts are around, e.g. de:Benutzer:Yair rand/TabbedLanguages.js. I thought that might be because DOM-node code is redundantly in both codes, but when I separated the DOM-node and translations codes, and imported both, I found that the trans-adder no longer appeared. (If it had appeared, I would have imported only the trans-code and TabbedLanguages, to see if the absence of redundant DOM-code allowed the two to work together.) Any idea why splitting the two seemingly discrete scripts causes them to cease funktioning (ie causes the trans-adder to cease appearing)? Any idea why the trans-adder appears but does not work when TabbedLanguages are around?
Separate issue: any idea what I did wrong when I tried to remove the "script" bit? That edit caused the adder to cease appearing. That edit also removed a bit of "gender"-code, but that wasn't problematic; I successfully removed it later. I rendered the "script" bit harmless (we don't use script templates on de.Wikt) by causing it to input nothing and removing the interface, but that leaves a lot of cruft. - -sche (discuss) 23:56, 17 September 2011 (UTC)[reply]
The top four lines of that edit are actually removing part of something completely unrelated to the "script" bit, but that part isn't what actually broke it. The edit contained an extra comma (ota:{,wsc:"ota-Arab"}) which caused a syntax error. --Yair rand 07:32, 19 September 2011 (UTC)[reply]
Thank you! I have got the script to work with Tabbed Languages; with your syntax-fix, now the unneeded "script" part has been successfully removed. I tested Tabbed Languages and the Trans-Adder together in the main namespace on de:Katze. I moved the code to de:Benutzer:-sche/uebersetzung.js, if anyone wants to see for themselves (remember that at the moment it is still oriented to {{Benutzer:-sche/sw1c}} and therefore only works on test pages or modified pages). The only issue I note now is that it wouldn't add more than one translation without me refreshing or navigating away from and back to the page; I wondered if I just didn't wait long enough (de:Katze had a lot of translations for it to sort through), but it displayed the same behaviour on my simple test page. (A minor problem I remind myself to fix is the unneeded space between * {{langcode}}.) de.Wikt will have to adopt glosses for this to work. - -sche (discuss) 09:27, 19 September 2011 (UTC)[reply]
Of note: the code works differently in different browsers and in the main vs the user namespace. (Those interested can temporarily restore this version of Katze and try using the code on it.) - -sche (discuss) 22:54, 19 September 2011 (UTC)[reply]

See Wiktionary talk:Etymology#Where to put etymologies --MaEr 10:56, 11 September 2011 (UTC)[reply]

Correlative conjunctions

How should the entries for correlative conjunctions look? Both...and, neither...nor, both-and, neither-nor or something else? I'm not asking just about English, but about other languages too. Arath 14:51, 12 September 2011 (UTC)[reply]

Making an arse of it ... ?

Interloper from Wikipedia here. While testing some new back-end scripts, I ran across template:a bum - a nonexistant template that is linked from many Wiktionary entries. Not sure if it's vandalism, preparation for some widescale future vandalism, or just a typo (album?) somewhere deep that invites same. I had a crack at finding the source, butt alas got nowhere. - Topbanana 17:24, 12 September 2011 (UTC)[reply]

Thanks! It's from {{t|ar|[anything]}} (also t+ and t-), but I can't seem to track it down farther. Incidentally, you spelled alass wrong.​—msh210 (talk) 17:47, 12 September 2011 (UTC)[reply]
[e/c] Got it. The culprit was template:ar/script, which had been vandalized. Thanks again.​—msh210 (talk) 17:55, 12 September 2011 (UTC)[reply]
(After an edit conflict) I dug through the histories of two of the entries, multiculturalism and dictionary. It seems to have been added to multiculturalism in this edit, and to dictionary in this edit. In other words, its transclusion seems to be caused by the removal of "sc=" from uses of {{t}}. - -sche (discuss) 17:54, 12 September 2011 (UTC)[reply]

I've protected template:ar/script; can some admin who knows how to write such a bot please flood-protect the [langcode]/script templates as highly visible?​—msh210 (talk) 17:57, 12 September 2011 (UTC)[reply]

highly visible templates that may need protection

User:Topbanana has generated a list of highly transcluded templates that lack full protection. Some can even be edited by non-autoconfirmed users. We may want to protect some of these. The list was at [[User:Topbanana/Template_protection]], which I've deleted so as not to advertise the list to would-be vandals (cf. w:wp:BEANS), and admins can find it now at [[Special:Undelete/User:Topbanana/Template_protection]].​—msh210 (talk) 15:16, 13 September 2011 (UTC)[reply]

Template:given name - sorting missing

Would an admin be kind enough to fix {{given name}} to allow sorting? This template currently fails to properly categorize Japanese given names, for instance. The code to tweak (not all the code, just a snippet):

<includeonly>{{#if:{{NAMESPACE}}|| [[Category:{{langname|{{{lang|en}}}}} {{#if: {{{diminutive|}}}|diminutives of}} {{{gender|{{{1}}}}}} given names {{#if:{{{from|{{{2|}}}}}}|from {{{from|{{{2}}}}}}|}}]]

Change to:

<includeonly>{{#if:{{NAMESPACE}}|| [[Category:{{langname|{{{lang|en}}}}} {{#if: {{{diminutive|}}}|diminutives of}} {{{gender|{{{1}}}}}} given names {{#if:{{{from|{{{2|}}}}}}|from {{{from|{{{2}}}}}}|}}|{{{sort|{{{skey|{{PAGENAME}}}}}}}}]]

As a minor side note, {{given name}} and {{surname}} are a bit inconsistent, in that {{surname}} includes a period at the end, and {{given name}} does not. Immaterial really, but it'd look a bit more put together if these two were in agreement. -- TIA, Eiríkr Útlendi | Tala við mig 16:08, 13 September 2011 (UTC)[reply]

Question about cats

Cleaning up some Japanese given name entries, I've stumbled on a bit of a puzzle. The name 恵美 (Emi) can also be the name 恵美 (Megumi). Using the {{given name}} template and the {{{sort}}} argument for each reading on the page, I'd expect the entry to show up in Category:Japanese female given names under both え for Emi and め for Megumi -- but it only shows up under め. I'm guessing that the cat applied by the second {{given name}} call is overriding the first.

So the $60,000 question is, is there any way for a single entry that belongs in a single category to show up under two different indices in that same category? If not, am I right in guessing that the index to use is the first one alphabetically (or in this case, hiraganically)? -- Eiríkr Útlendi | Tala við mig 16:52, 13 September 2011 (UTC)[reply]

You may have to create a redirect using a zero-width non-joiner (&#8204;), as discussed here, and sort the main entry into one category and the redirect into another ... if it is possible to use sort in redirects. - -sche (discuss) 20:57, 13 September 2011 (UTC)[reply]
Interesting. What a wonderfully ugly cludge. Well, so long as it works. That might just be what I do if no one has a better idea.  :) -- Thank you, Eiríkr Útlendi | Tala við mig 21:12, 13 September 2011 (UTC)[reply]
Well, I just went ahead and created the page 恵‌美 and put the second cat index there, and it works -- the entry 恵美 is now properly indexed under both えみ (Emi) and めぐみ (Megumi). Thank you, -sche! -- Eiríkr Útlendi | Tala við mig 15:52, 15 September 2011 (UTC)[reply]

Pokemon get their own ja-noun template?

I'm going through Category:Japanese_nouns to clean up after realizing I'd left the indexing argument hidx out of a lot of entries I'd been working on, and I discovered that Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック still get categorized under katakana even after I added the hidx arg to the POS template. Then I realized that these entries get their own POS template: not the usual {{ja-noun}} template, but {{ja-noun/pokemon}} instead. Is this kosher? It seems awfully dodgy to me. If we're keeping this template, shouldn't we at least align its formatting with {{ja-noun}}? -- Bemused, Eiríkr Útlendi | Tala við mig 23:30, 14 September 2011 (UTC)[reply]

Those entries aren't actually supposed to exist anymore. They're only still there because nobody bothered to merge them into the list yet. --Yair rand 00:02, 15 September 2011 (UTC)[reply]
Okay. Merge them into what list, though? RFD? -- Eiríkr Útlendi | Tala við mig 06:56, 15 September 2011 (UTC)[reply]
There was a vote on the issue, so you don't need an RFD. --Mglovesfun (talk) 08:56, 15 September 2011 (UTC)[reply]
Is there anything I can / should do to help deal with these entries? My editor instincts for having things sorted out are getting itchy.  :) -- Eiríkr Útlendi | Tala við mig 15:38, 15 September 2011 (UTC)[reply]
The community decided these entries for fictional things should be in lists. Some appropriate lists are Appendix:DC Comics (Bat-Signal, Kryptonian...) and Appendix:The Legend of Zelda (Hylian, Ocarina of Time...).
The coverage of Pokémon is still a bloody mess; the work of making these pages with lists is half-done. You can help by creating the lists, if you are interested. Everything you need is here: Special:PrefixIndex/Appendix:Pokémon. --Daniel 17:53, 15 September 2011 (UTC)[reply]
Thanks, Daniel. Looking at that list, I notice that Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック are the only two katakana entries on that list, and both already have their romanized versions of Appendix:Pokémon/Arbo and Appendix:Pokémon/Arbok. Am I right in guessing that all other katakana Pokémon entries have been removed and/or converted to the official romanizations? If so, can we delete the Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック entries? -- Eiríkr Útlendi | Tala við mig 21:24, 15 September 2011 (UTC)[reply]
No, nobody had the idea of deleting the katakana entries and leaving only the official romanizations.
Ideally, Wiktionary can have a complete glossary of all 649 names of Pokémon in English, the 649 names in katakana, the 649 official romanized names and even the 694 romanizations from katakana. I think that would simply be two big appendices: one of English and one of Japanese; but I may be wrong. If these lists are created, then someone searching for, say, Beedrill, will at least be able to know it is the name of a Pokémon species.
We have few Pokémon listed simply because nobody bothered to make the full lists. --Daniel 01:05, 16 September 2011 (UTC)[reply]
This is straying more into Grease Pit territory, but if we're going to keep the Pokémon appendix entries in all valid scripts, how do we make sure they get indexed properly? Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック are currently indexed as Japanese nouns under ア, when they should be indexed under あ. I tried adding the hidx arg used in the normal {{ja-noun}} template, but {{ja-noun/pokemon}} doesn't implement any explicit sorting.
Plus, the formatting of {{ja-noun/pokemon}} is a bit jarring in its differences from {{ja-noun}}. My gut instinct is to unify these, but I'm not sure what the designer of {{ja-noun/pokemon}} intended, or what anyone else thinks. Any advice? -- Eiríkr Útlendi | Tala við mig 01:19, 16 September 2011 (UTC)[reply]

Appendix:DC Comics and Appendix:The Legend of Zelda don't use anything like {{ja-noun}} or {{ja-noun/pokemon}}. These lists don't have headword-lines like entries. If the appendices of Pokémon follow suit, then {{ja-noun/pokemon}} should just get deleted.

Indexing should be simple. Appendix:DC Comics is indexed under "D", because "DC Comics" starts with that letter. Appendix:Pokémon/Species would be sorted under "P".

However, the appendices should not stay in Category:Japanese nouns, because there it would be regarded as clutter. In that category, the focus is keeping the Japanese nouns of entries rather than the ones of appendices. --Daniel 02:33, 16 September 2011 (UTC)[reply]

Thanks, Daniel. It's probably easy enough to remove the Pokémon entries from Category:Japanese nouns just by editing {{ja-noun/pokemon}} for now. Some sort of header template is probably a good idea, in order to include info like official Japanese name and romanization thereof (which sometimes differs markedly from the official name in the Latin alphabet), so I'm not sure we should just delete it -- maybe tweak it to categorize things properly, and maybe move it? I'll at least make a few minor changes tonight. -- Eiríkr Útlendi | Tala við mig 05:21, 16 September 2011 (UTC)[reply]

Yes, the consensus clearly is deleting individual pages like "Appendix:Pokémon/Super Potion" in favor of having big lists like, possibly, "Appendix:Pokémon/Items" or "Appendix:Pokémon/Objects".

Now I edited {{ja-noun/pokemon}} so it doesn't categorize appendices into Category:Japanese nouns anymore.

Actually, when the appendices of English terms of Pokémon were created, they were using a version of {{en-noun}} that did not categorize them into Category:English nouns; but now it categorizes. Naturally, if all the appendices that use {{en-noun}} are replaced by lists, the miscategorization will stop.

I created Appendix:Pokémon/Species with a basic format for organizing headwords, definitions and translations together. One small problem of this system is that the translations are all redlinks to entries. This should be fixed eventually. --Daniel 06:12, 16 September 2011 (UTC)[reply]

Adjectives in the translation sections of nouns

User:DCDuring and I have been discussing how to handle adjective translations of English attributive nouns. Many languages use adjectives where English uses nouns attributively. For example, a "corkn." = a "пробкаn.", but "corkattr. n. insulationattr. n. materialn." = "пробковыйadj. (cork) изоляционныйadj. (insulation) материалn. (material)"1. As I see it, there are several ways we can handle this:

1. List the adjectives in the nouns' translations tables, roughly like in 'cork'. This has long been en.Wikt's general practice — to include in translation sections, where appropriate, words not the same part of speech as the word they translate. The translations section of the adjective abroad contains the German preposition + article + noun im Ausland; German routinely uses nouns in compounds to express things English expresses with adjectives, so adjectives like racial routinely contain nouns like Rassen-.

2. List the adjectives in separate tables in the translation sections of the nouns, like in 'brass'. This was Matthias Buchmeier's suggestion. It has the advantage of distinguishing those many translations, which are systematically different; it has the disadvantage of inviting confusion or duplication of languages which do not use other parts of speech but rather use nouns like English does.

3. Have separate sense lines for attributive nouns, and list the adjectives in the separate tables of those sense lines, like in 'spruce'. This would represent a significant change to our current practice. This has the advantage of establishing great clarity. It has the disadvantage of inviting duplication of translations from languages which use nouns attributively like English does, and it is duplication in the English section, or at least overspecificity. It would represent a significant change to our current policy and practice. Multiple sense lines would be required, or the attributive sense lines would hold multiple definitions: "cat paw" is a paw "of a cat", "cat booties" are booties "for a cat", a "cat addiction" is an addiction "to (having) cats"...

4. Include adjective POS sections for attributive nouns. This would represent a significant change to our current policy and (in most cases, eg insulation) our current practice. It would create some clarity (words which were used in some of the same ways as adjectives would have adjective POS sections), but would also create some unclarity or would mislead (words would be called adjectives, though they could not be graded or used alone after 'become' or in other key ways adjectives can be used). It would also have the disadvantage that users who understood that "cork insulation material" was [attributive noun + attributive noun + noun] would only find the translations of the attributive noun "cork" in the adjective section.

5. Allow foreign-language entries to host translation sections, as we do on de.Wikt. This would represent a significant change to our current policy and practice. It would have advantages in other areas (few languages have verbs with exactly the same meaning, for example), but that is for another discussion (please!): it would be a poor solution to this issue: the adjective пробковый and all other languages' same-meaning adjectives could host translation sections, but to find such a translation section, a user would have to know one of the translations already (and it would have to be a bluelink).

6. Not present the information. This would represent a significant change to our current policy and practice. It would have the disadvantage of preventing us from having complete translation sections; if brass can be used in two ways, "that metal is brass" and "the brass knob", but we only provide the translation that works in uses like "that metal is brass", we'd be missing an accurate translation.

User:DCDuring (who may have other ideas on how to handle adjective translations of attributive nouns) and I felt we should bring this up for discussion here, in the Beer Parlour, for several reasons. Wiktionary has long included translations which are different PsOS than the words they translate, but the inclusion of this giant category (words, in all languages, which translate attributive nouns) has not — AFAICT — been discussed. Should we continue to include it? If we do, should we simply list the adjectives in the noun tables without distinction, or should we introduce them with some clause, or list them in separate tables (like brass)? If we set them apart with some clause, what should it say?

How should we handle nouns (I give oak as an example, although at the moment it still has an adjective section) which are often used attributively, but which have corresponding adjectives (oaken)? Should adjectives like дубовый be in oak (for consistency, accuracy, and completeness, because the nouns are used in places in English that other languages uses their adjectives for — because English, despite having oaken, still says "oak table") and in oaken, or only in oaken (since it, an adjective, exists)? - -sche (discuss) 02:23, 16 September 2011 (UTC)[reply]

Oak really shades into adjectiveness. It's used both attributively and predicatively in ways that resemble an adjective (google books:"oak furniture", "the furniture was oak"). In such cases my own opinion is that it's best to include an adjective section, even if Occam's razor would prefer we consider it solely a noun; but even if we don't, I think we could have a sense “(shading into adjective use) Made of this wood; oaken” and a corresponding translations box. Adjective translations in that box could be tagged with {{pos a}} or {{qualifier|adjective}} or whatnot.
However, oak is an extreme case. In the typical case, the ordinary translation of the noun is as a noun, and if a given language uses an adjective to translate certain cases, I think it's probably best addressed through usage notes in that language's noun entry. For example, [[étoile#French]] could explain that in many cases where English tends to use the noun star, in French the adjective stellaire is likely to be used: système stellaire (star system), amas stellaire (star cluster), etc. (I realize that explaining it in the French entry does not preclude explaining it in the English entry's translations-box as well, but I think the latter is just too hard to do intelligibly.)
At [[oaken]] and [[stellar]] and such, we'd of course give the usual adjective translations. (Well, maybe [[oaken]] would just have {{trans-see|oak}}, or vice versa.)
RuakhTALK 03:15, 16 September 2011 (UTC)[reply]
Ruakh is right - put them in "usage notes in that language's noun entry". All such long exlanations will - if carried to conclusion - result in unmanageably large (and probably unreadable) translations tables.
I would also say that all synonyms, gender forms, &c are also best put in the foreign language word's entry. —Saltmarshtalk-συζήτηση 10:54, 16 September 2011 (UTC)[reply]
@Ruakh: I've continued the discussion of whether or not the specific word oak is an adjective at WT:RFV.
@Saltmarsh: I may be misunderstanding what you mean when you suggest omitting "synonyms". Our general practice has AFAICT always been to include (like other translations dictionaries) all words in the foreign language which mean what the English word means. If you would not include all words, how would you decide which word was the main word and which words were just "synonyms" of it? (For example, how would you decide whether to include священный and omit святой from holy, or include святой and omit священный?)
@Ruakh and Saltmarsh: it is possible to have the adjectives in the translations sections of nouns without long notes, if your concern is that the notes are clutter: the translations could be given without introduction, like this, or in separate tables, like this (which seems similar to what you consider doing at [[oak]]).
If you are more broadly concerned about having adjectives in the translations sections of nouns, do you think we should remove the nouns that are in the translations sections of adjectives, where languages routinely use nouns in compounds where English uses adjectives, eg German Modell-noun in modeladj., Finnish yhteiskunta-noun in socialadj., Swedish stjärn-noun in stellaradj., etc? Do you think we should remove all Navajo translations from our adjective entries? Navajo expresses "itpronoun isverb adjectiveadj.", eg "itpronoun isverb whiteadj.", as "łigaiverb (it is white)".
What do you think should be done in cases where the use of a specific different POS is not routine (eg German im Auslandarticle+prep. + noun in abroadadj.)? - -sche (discuss) 05:38, 17 September 2011 (UTC)[reply]
@-sche/synonyms - dont synonyms usually have subtle differences? And in some cases you may have 4 or 5 of them - how does the user judge which is best for their needs? Look at all five? Easier to give the most accurate translation or the most frequent. The user will follow this link, where more information for each form can be given - as under the See also head (which could have been Synonyms) for επειδή. —Saltmarshtalk-συζήτηση 06:29, 17 September 2011 (UTC)[reply]
I think that the way used in cork is good. It's understandable. Lmaltier 16:31, 16 September 2011 (UTC)[reply]
To clarify, do you mean with the introduction "corresponding to English attributive use, meaning ‘made of cork’:", or without it? - -sche (discuss) 05:38, 17 September 2011 (UTC)[reply]
WIth the introduction "corresponding to English attributive use, meaning ‘made of cork’:"; without it, it is very misleading. Lmaltier 08:35, 17 September 2011 (UTC)[reply]
An adjective in a non-English language should IMHO be found on the page of the noun whose attributive use it captures, whether in a usage note, in derived terms or on the headword line. Thus, I dislike the Dutch translation line in the English section of this revision of "cork": I prefer "Dutch: kurk (nl), kurklaag" to "Dutch: kurk (nl), kurklaag; corresponding to English attributive use, meaning ‘made of cork’: kurken"; I disagree with "Dutch: kurk (nl), kurklaag; kurken (nl)", as "kurken" has no place to seek there. In this I seem to agree with Ruakh and Saltmarsh. --Dan Polansky 07:02, 22 September 2011 (UTC)[reply]

Japanese POS templates and how entries are indexed

Chewing on the issue of how entries are categorized and indexed, I re-read WT:About Japanese, in particular the section Wiktionary:About_Japanese#Sorting. As I'd suspected, romaji entries should be indexed alphabetically, not under the corresponding hiragana. However, a quick look at Category:Japanese nouns, for instance, shows many romaji entries indexed hiraganically.

Looking deeper, {{ja-noun}} has the following line towards the end:

[[Category:Japanese nouns|{{{hidx|{{{hira|{{PAGENAME}}}}}}}} {{PAGENAME}}]]

This needs to be changed to:

[[Category:Japanese nouns|{{#ifeq:{{{1}}}|r|{{lc:{{PAGENAME}}}}|{{{hidx|{{{hira|{{PAGENAME}}}}}}}}}} {{PAGENAME}}]]

This tweak will index JA noun entries lower-case-alphabetically if the first template arg value is the letter "r", which it should be for romaji entries. I just made a similar change to {{ja-verb}}, but {{ja-noun}} is locked down. I'm about to crash for the night, so if anyone unlocks the template, don't expect much for the next ten hours or so.  :) -- Cheers, Eiríkr Útlendi | Tala við mig 06:41, 16 September 2011 (UTC)[reply]

Perhaps you also want to request a dump from the database, similar to Index:Russian, so that a Japanese index is updated. The current Index:Japanese is a joke. Then you'll see both red (from translations) and blue Japanese words. I don't know how this is done though. --Anatoli 07:14, 16 September 2011 (UTC)[reply]
Don't ask me why, but it does seem to be very common to sort using the hiragana. Perhaps it's WT:AJA that needs to be changed. Mglovesfun (talk) 08:50, 17 September 2011 (UTC)[reply]
@Anatoli:
Who would I make such a request of? And what does a dump do? My (admittedly limited) understanding of DB management is that a database dump is just an output of its contents. And what are the language indices used for? Just looking at Index:Japanese, I'm not sure why it would be a joke, but then again I've never had occasion before this to look at indices for a whole language. -- Ta, Eiríkr Útlendi | Tala við mig 23:11, 18 September 2011 (UTC)[reply]
@Mglovesfun:
Some of us (so far User:Haplology, User:MichaelLau, and myself) are discussing and working on editing WT:AJA, having created a working draft at Wiktionary:About Japanese/Draft. We haven't done too much yet, but the idea was that a separate draft page would make it easier to implement drastic changes on the fly and see how it all looks, without confusing folks by changing the main About page until we have something we're happy with. Please chime in at Wiktionary talk:About Japanese/Draft if you have any ideas or opinions to discuss on how to change things.
With regard to sorting, the current policy stated at Wiktionary:About_Japanese#Sorting seems to state that kanji and kana headwords should be sorted by hiragana, while romaji entries should be sorted alphabetically -- this makes the most sense from a learner's standpoint, in that someone just starting with Japanese can still find a word in the index even if they don't know kana. However, it seems that 1) the Sorting section might stand some clarification, and 2) the description of how to use the hidx sorting parameter provided in the documentation for each of the Japanese POS templates is a bit less clear than WT:AJA. Moreover, many contributors of Japanese terms seem to be ignorant of WT:AJA, or at least not fully versed in it (which isn't too hard to understand given how long the document is, and all the complexities of dealing with Japanese when writing in English).
Anyway, I'll give Wiktionary:About_Japanese#Sorting another look this week, maybe tweak our Draft version of it a bit, and also go through the Japanese POS templates and update their documentation to make things a bit clearer when it comes to indexing. -- Cheers, Eiríkr Útlendi | Tala við mig 23:11, 18 September 2011 (UTC)[reply]
I didn't know who created a database dump, I think it must be User:Conrad.Irwin, I remember reading about it and people requested indexes for other languages. I suggested to explore who created and refreshed Index:Russian and I find it a really good index, it looks like it has indexed tens of thousands entries and translations (many are in red). The possible challenge I see in creating the Japanese index is in, well, sorting and indexing the Japanese words. Why it (the Japanese index) would be a joke? Because it only has about a hundred KANJI, not tens of thousands WORDS. --Anatoli 04:59, 19 September 2011 (UTC)[reply]

do not and does not

Any reason why the entries do not and does not don't exist? --The Evil IP address 13:56, 17 September 2011 (UTC)[reply]

...and will not, could not, must not, might not, may not, etc.? Or is there something special about do? Equinox 13:58, 17 September 2011 (UTC)[reply]
Nothing special about it, but I just noted that we had the contractions don't or doesn't without the spelled out versions (which are not uncommon). --The Evil IP address 14:09, 17 September 2011 (UTC)[reply]
Are they not just sum of parts? Do + not? —CodeCat 14:10, 17 September 2011 (UTC)[reply]
Yeah, I think so. We don't have the contractions in order to define them per se, but in order to explain what sequence of words they are short for. Equinox 14:14, 17 September 2011 (UTC)[reply]
I think these are an exception. The entry could be quite useful, for example both negate a sentence. This cannot be said about most other verbs, as well as most verbs can't have a "not" after it. The etymology section could also be very interesting, stating why do can be negated, but most other verbs not. --The Evil IP address 14:22, 17 September 2011 (UTC)[reply]
Maybe, although it feels more like something for a grammar book or grammatical appendix. I was thinking the other day about how isn't and don't mostly behave alike, but not always ("didn't you do..." is fine, but "didn't you be..." is impossible). Equinox 14:25, 17 September 2011 (UTC)[reply]
Why didn't you be more certain of that and look for it in Google book search? SemperBlotto 14:33, 17 September 2011 (UTC)[reply]
Ha! Interesting. Those sentences sound very odd to me, especially "why didn't you be including the bombing of civil populations amongst their crimes?". Equinox 14:38, 17 September 2011 (UTC)[reply]
Using 'not' is rare with most verbs now but it wasn't so rare in Shakespeare's time I think? This is also considered 'modern' English so we would need to include it. —CodeCat 15:18, 17 September 2011 (UTC)[reply]
I think it's sufficient to relegate this to a usage note at [[not#Adverb]] (where it is already, as the first usage note) or an English-verbs appendix.​—msh210 (talk) 18:15, 18 September 2011 (UTC)[reply]

The negative forms that most English auxiliary verbs have are just that: negative forms. They are not contractions like apostrophe d for would or apostrophe s for is or has, etc., though that's how they started. Rather, the n't is an inflectional ending. The rational is here.--Brett 01:13, 25 September 2011 (UTC)[reply]

A new list of Latin Epithets (same suffixes together)

new list: Epithets by Suffix (contents)

I've created this new list which I'm calling "Epithets by Suffix". It's pretty huge (300,000+ entries) and is "retro-sorted" from nausicaa to tausaghyz. This allows words with the same suffixes to be worked on together. (as requested by DCDuring)

This is a follow-up to the Top 1000 list I posted recently. I created the Top 1000 list because I think it's really important that all these words have entries. For example, there are 882 species on this planet which have been given the specific epithet fasciata (including five animals and one plant which are all threatened with extinction, such as the Fiji Banded Iguana, Brachylophus fasciatus). There's no Latin entry for fasciata, nor for another 82 epithets which are each used by over 300 species.

I made these lists for Wiktionary's editors, so we can create or improve entries for any of the most common specific names as well as those of threatened species. I am interested in the diversity of life and its conservation, and would like to see the subject become less difficult for others studying or interested in biology. Scientific names are often seemingly opaque in meaning, and can be intimidating and difficult to work with when there's no easy to way understand them.

Wiktionary is becoming the go-to source for definitions. I want to encourage those who are improving it, and thus (deliberately or not) making the biological sciences less frustrating, less intimidating, and less mysterious for all. Latin is very well represented on English Wiktionary, currently having more entries than any other language (including English, counted by number of definitions). So I hope adding scientific Latin overlaps with the interests of Wiktionary's Latinist contributors, and I hope it's not too much of a stretch to sometimes delve into the less "pure" world of New Latin and scientific interlingual words. I'm trying to learn enough Latin and Wiktionary syntax to help more, but even if I were a grandmaster, I can't do it all on my own, so I'm hoping the top 1000 list as well as this new suffix-sorted list will encourage the Latinists here to consider looking at modern scientific Latin usage. And thanks also to those who have made some kind of start.

I've since improved the Top 1000 list so that filled entries have a strike-through, making it easier to identify blue links which don't yet have Latin or Translingual entries. Also I've highlighted epithets which belong to threatened species in order to increase their visibility (my own personal interest). These markups are on both lists.

This has all been a lot more work than I expected, and has all been done in my spare time, so I'm hoping it pays off with people actually using the lists to improve Wiktionary. If the lists get used, I'll take that as a show of their usefulness and spend the time to keep updating and improving and expanding or creating tools to help flesh out entries. Otherwise I'll leave it at this. It would be nice to move on to genera too.

TL;DR: New list of "Latin" specific epithets sorted by suffix (mostly Latin adjectives). I really do hope these lists are helpful and lead to the creation of new Wiktionary entries. Pengo 06:28, 18 September 2011 (UTC)[reply]

American vs European music terms

I'm not sure whether this has been discussed before but I'm not sure what to do with regard to some musical entries. You may or may not be aware that in America music theory is taught with quite an array of different terms to the words used in European (chiefly British) music theory which have the same meanings.

e.g. whole note (American) --> semibreve (British)
quarter note (American) --> crotchet (British)
staff (American) --> stave (British)

From my experience a lot of American musicians have a hard time understanding the British equivalents as they often don't get taught, and vice versa. I find this hard to incorporate in definitions such as 2/2 where it may be difficult for a British reader to understand (half note would be a minim, and a measure would be a bar). Equally it would be hard for an American reader if the definition was in British English. There are quite a lot of entries where this problem arises; I often have to look up the meanings of the words because I was never taught the American terms.

I'm not sure if there's a way around it except to assume that someone will look up a word if they don't understand it. It's quite a niggling issue in the academic music world. —JakeybeanTALK 05:55, 20 September 2011 (UTC)[reply]

Perhaps a table expanded from one similar to that shown below could be added to each relevant page - or assigned to an appendix?
  British American
Notes semibreve whole note
minim half note
crotchet quarter note
Miscellaneous stave staff
Saltmarshtalk-συζήτηση 10:07, 20 September 2011 (UTC)[reply]
It appears to be a simple AE vs. BE difference. At least in Germany, the American terms are used. -- Liliana 11:52, 20 September 2011 (UTC)[reply]
Perhaps the definition for 2/2 can read {{music}} A [[meter]] of [[two]] [[half note]]s {{gloss|[[minim]]s}} per [[measure]] {{gloss|[[bar]]}} or the like, providing the BrE and AmE terms each time. (The above looks like: (music) A meter of two half notes (minims) per measure (bar).)​—msh210 (talk) 16:34, 20 September 2011 (UTC)[reply]
We don't need both "measure" and "bar" since both terms are used in American English, so in the definition we could just change "measure" to "bar" without loss of understanding to American readers. —Angr 17:13, 20 September 2011 (UTC)[reply]
But non-native speakers, like me, don't understand bar, only measure, because the former is not associated with any musical term in Germany. -- Liliana 18:34, 20 September 2011 (UTC)[reply]
We can't start trying to second-guess what words non-native speakers might know and what words they might not. The German word is Takt, which isn't obviously connected to either "measure" or "bar", and which English word Germans learn depends on which variety of English they're exposed to. I don't think we can get around the fact that non-native speakers may have to look up words in a gloss whose meaning they don't know, but it would be good if we could minimize that for native speakers. —Angr 20:28, 20 September 2011 (UTC)[reply]
Oh, I wouldn't know. I was trying to split it on BrE/AmE lines, and may have erred. My general idea though is that {{gloss}} be used when the Brits have one word and the Yanks another and never the twain shall meet.​—msh210 (talk) 18:53, 20 September 2011 (UTC)[reply]

Japanese kanji entries and classical vs. modern readings

Going through Category:Japanese_terms_needing_attention to do some mostly-mindless clean-up work, I've run across a number of kanji entries where the list of readings includes things that semantically sorta make sense, but that I've never seen. 不#Readings, for instance, lists the kun'yomi "せず (sezu), にあらず (niarazu), いなや (inaya)", which make sense since 不 essentially means "not" and all these kun'yomi are related to negativity, but I've never heard of 不 having any kun'yomi at all. Moreover, neither the Jisho.org entry nor Jim Breen's site (you'll have to enter the kanji yourself, I can't link directly) list any kun'yomi, nor do my dead-tree dictionaries. The Weblio entry does list these kun'yomi, but various things about Weblio make me think that they include classical Japanese readings, not just modern. That said, classical Japanese was much more varied in terms of how things can be spelled -- imagine Chaucerian English spelling, only far looser -- and thus classical readings aren't always terribly pertinent to the modern language.

This leads me to wonder if we should mark classical readings somehow? Or should we leave them out altogether? -- TIA, Eiríkr Útlendi | Tala við mig 20:25, 20 September 2011 (UTC)[reply]

Japanese multiple readings are pain in the butt, especially names. Im amused at how 夜神月 is actually read Yagami Raito. Just use {{qualifier}}, I guess. I'm sure many kanji don't have a comprehensive list of all possible readings. --Anatoli 01:34, 21 September 2011 (UTC)[reply]
Hm, yes, marking non-standard readings using qualifiers or something similar seems to be the emerging consensus. However, I don't think we even could go for "all possible readings", given the flexibility of how kanji are used.
FWIW, 夜神月 seems to be a manga or anime character, in which case all bets are off as to reading - the author(s) could just as well decide that a given kanji string should be read Furī Uirī, or Ai Raiku Dōnattsu, and that would be that. Manga and anime readings are sometimes the very picture of arbitrariness.
With that in mind, I'd be more inclined to have kanji entries here limit the list of readings to attested historical readings, and leave out anything that's clearly a creative neologism of limited currency -- basically apply something like CFI to the readings themselves. :) -- Eiríkr Útlendi | Tala við mig 15:48, 22 September 2011 (UTC)[reply]
夜神月 ( (tsuki) meaning Moon in this name is read as Raito, from English "light", watch Death Note - highly recommended, the best quality anime I've seen (the movie is not as good)!) is an extreme example but this arbitrariness is not restricted to names and not only manga names. I see your point but I find that listing too many readings for a kanji can also be counterproductive. Readings can be borrowed from other kanji with similar meanings, like with your example of いなや (inaya)", which is normally written as 否や in kanji. --Anatoli 22:56, 22 September 2011 (UTC)[reply]

Hindi and Urdu vs Hindi-Urdu or Hindustani

I don't want to be mean and just change the headings from Hindi-Urdu to separate Hindi and Urdu as in the translations for Hindustani. I don't think there was a policy of merging the two languages together, even if Hindi and Urdu templates allow to display words in both scripts. Any thoughts? --Anatoli 00:22, 21 September 2011 (UTC)[reply]

They should definitely be separate. There was no discussion on merging them (and even then, there is no code for Hindi-Urdu we could use). -- Liliana 11:37, 21 September 2011 (UTC)[reply]
I think we should at least discuss it. We could create a code, perhaps {{inc-hin}}. Currently our Hindi and Urdu template include things like 'Hindi spelling' and 'Urdu spelling', implying that they are the same language. I have absolutely no input on whether we should treat them as the same language, but we should discuss it. --Mglovesfun (talk) 17:15, 21 September 2011 (UTC)[reply]
They are the same, yes. We only treat them separately due to two different scripts being used, so we can have all Hindi words in Devanagari and all Urdu words in Arabic script. -- Liliana 17:19, 21 September 2011 (UTC)[reply]
A small correction. There are layers of heavily Sanskritised words in Hindi, which are not used in Urdu, the reverse is true as well. There are many words of Persian and Arabic origin in Urdu, which are not used in Hindi. Having said this, Urdu can be written entirely in Devanagari (this type of writing is, in fact, more precise about consonants, which are missing in Sanskrit, like z, f, x, q, ġ, etc., Hindi writers often replace them with j, ph, k, g, etc.) and Hindi can be written entirely in Perso-Arabic script as well. The high level words are getting more out of use, as Hindustani, a spoken variety of both Hindi and Urdu is getting popular due to Bollywood, songs and media. Hindi and Urdu now borrow a lot from each other and from English making them even closer. --Anatoli 23:39, 21 September 2011 (UTC)[reply]
Structurally they are different standardized registers of the same language, comparable to Croatian, Serbian, etc. being standardized versions of the same language (which is called Serbo-Croatian). Because they are the same language, I would be in favor of a unified header, like we have for (Roman and Cyrillic) Serbo-Croatian. Though, the damage would not be like we used to have in the case of Serbo-Croatian (three or even four identical entries on the same page), because, AFAIK, there should never be both a Hindi and an Urdu entry at the same page anyway because they use different scripts (well, at least the standardized registers, of course). --JorisvS 19:26, 22 September 2011 (UTC)[reply]
I love arguing with Indian people about Hindi and Urdu being different languages. I quote religious Urdu stuff that they understand perfectly and I'm like "really, because half those words are from Persian, so if Urdu wasn't the same language as Hindi you wouldn't understand this." Anyway. As JorisvS points out, the mess isn't as serious as Serbo-Croatian once was because the headers aren't used on the same page. If what's desired is one header, I think Hindi-Urdu is a bit odd, and Hindustani would probably be the most neutral. In translation tables (I already do this for descendants tables and in Etymology) we could have
* Hindustani 
*: Hindi: 
*: Urdu:
I'm sure (=positive) some people (mostly racists) would bitch and whine, as with Serbo-Croatian. But they're lesser people. If there are words that aren't used frequently in India, they can be marked as predominantly Pakistani in Usage notes, and vice versa. Wouldn't be a big deal. The main concern would be categorization. We have problems with Chinese (simplified and traditional) and some people worrying about Serbo-Croatian, but I wouldn't really be opposed to something like Cat:Urdu spellings of Hindustani nouns/verbs/whatever. </ideas> — [Ric Laurent]20:04, 22 September 2011 (UTC)[reply]
Hindustani sounds more interesting than Hindi-Urdu. I also noticed that Hindi speakers like to say that their language is closer Sanskrit and Urdu speakers say Urdu is closer to Persian. In reality, they both have enough from both. It may be harder to find Urdu equivalents for "clever" Hindi words like प्रदूषण (pradūṣaṇ) (pollution) but otherwise most Hindi words have Urdu equivalents and vice versa. Didn't you say you were avoiding Beer Parlour? :) Thanks for your input, Ric. --Anatoli 23:05, 22 September 2011 (UTC)[reply]
Sometimes things of actual importance are discussed here, so when I see notifications of good conversations, I try to throw a few cents at it lol. (In fact, I'm considering our below-discussed Arabic problems, get some ideas out there) — [Ric Laurent]22:44, 25 September 2011 (UTC)[reply]

Filipino and Tagalog

Update: On 1 Nov 2011, unification of Cat:Tagalog and Cat:Filipino was approved by a vote (opened 8 Oct)

I found a page (paalam) with an entry for the Filipino language, and another for Tagalog.

These are the same language.

The government of the Philippines wanted to make a national language and they decided in 1937 that it would be "based on Tagalog", the language of the capital. In 2007, the chair of the government's Commission for the Filipino Language (Komisyon sa Wikang Filipino) reported on these efforts:[1]

Are “Tagalog,” “Pilipino” and “Filipino” different languages? No, they are mutually intelligible varieties, and therefore belong to one language. [...]
The other yardstick for distinguishing a language from a dialect is: different grammar, different language. “Filipino”, “Pilipino” and “Tagalog” share identical grammar. They have the same determiners (ang, ng and sa); the same personal pronouns (siya, ako, niya, kanila, etc); the same demonstrative pronouns (ito, iyan, doon, etc); the same linkers (na, at and ay); the same particles (na and pa); and the same verbal affixes -in, -an, i- and -um-. In short, same grammar, same language.

This explains why there are no Tagalog-Filipino dictionaries, no Tagalog-Filipino translators/interpreters, and no documents or cultural goods ever produced in separate versions for each.

I can also personally confirm this, as a speaker of the language.

To fix the above-mentioned article, I removed the "Filipino" section from the page (and pasted it on the Talk: page, for reference). Gronky 11:32, 21 September 2011 (UTC)[reply]

I am not opposed to it. I always wondered why we cover Filipino and Tagalog separately. -- Liliana 16:35, 21 September 2011 (UTC)[reply]
I have no input, other than it's an important issue and should be discussed rather than individual editors working using their own opinions. --Mglovesfun (talk) 17:18, 21 September 2011 (UTC)[reply]
Is there a right place to discuss it, with an eye to setting policy?
This isn't controversial. Gronky 23:25, 21 September 2011 (UTC)[reply]
I support this. We should probably just use Tagalog. It's the most common word now in use for the official language of the Philippines and we already use Tagalog much more often than Filipino/Pilipino. The difference is subtle and there's nothing that can't be resolved with occasional {{qualifier}} tags. --Anatoli 23:29, 21 September 2011 (UTC)[reply]
Here. And it'd be nice to mention to the frequent contributors in both languages (or the language, whatever) that the discussion exists.​—msh210 (talk) 00:09, 22 September 2011 (UTC)[reply]
I support this too. Different registers of the same language should use the same header; in this case Tagalog is the name of the language and so should be used. When differences exist these can indeed be properly tagged anyway. --JorisvS 19:36, 22 September 2011 (UTC)[reply]
Do we have any? As for me, I used {{tl}} (Tagalog that is) for some translations. --Anatoli 02:49, 22 September 2011 (UTC)[reply]
I'll set up a vote, but leave enough time for the discussion to continue. Mglovesfun (talk) 20:26, 25 September 2011 (UTC)[reply]
I understand that, for the moment, it's the same language, but that Filipino should become a mix of different languages used in the country, and that a commission is working toward this objective. It seems logical to use Tagalog only for the moment but, if somebody creates a Filipino entry nonetheless, there is no reason to delete it. It might become useful in the future. Lmaltier 10:06, 2 October 2011 (UTC)[reply]
The "mix of different languages" proposal was the plan that was announced in 1937, but no effort was put into it, so it never even got off the ground. In the intervening 74 years Tagalog has been used in every situation where "Filipino" was meant to be used, and it has been taught in every school in the Philippines for the past two generations.
There are no efforts currently under way to create a "new" Filipino. ::The speed at which the Spanish language mostly disappeared from the Philippines is an example of how quickly things can change. (Sidenote: Spanish was ubiquitous there a century ago and most Philippine authors wrote in Spanish, but now, the lack of Spanish knowledge among first-language Tagalog speakers is such that when 19th century Philippine literature is being translated to Tagalog, they usually have to do two-step translations Spanish->English->Tagalog.) But changing a language does take at least a generation, and no such effort has begun yet or is being proposed. Gronky 20:48, 3 October 2011 (UTC)[reply]
"Should become" makes me think of Wikipedia's Crystal Ball. We cannot do something just because we think something might happen/be in the future. If and when such a situation arises (and this is, as Gronky points out, not all too likely) we can deal with it then. --JorisvS 21:40, 3 October 2011 (UTC)[reply]

Deprecating zh, zh-cn and zh-tw in category names

That's it really. AFAICT this always refer to Mandarin, though it's possible that in some cases zh could be used erroneously for another Chinese language such as Cantonese (NB, {{zh}} displays Mandarin). Would anyone like to expressly support or oppose this proposal? The proposal is 'replace zh, zh-cn and zh-tw in topical category names' like Category:zh-cn:Computing to Category:cmn:Computing. Mglovesfun (talk) 20:03, 21 September 2011 (UTC)[reply]

I agree with you. Engirst 20:12, 21 September 2011 (UTC)[reply]
Err... but how are you going to separate traditional and simplified script? ---> Tooironic 21:26, 21 September 2011 (UTC)[reply]
How are we going to - well, if we want to Category:cmn:Computing in traditional script, or something similar. Mglovesfun (talk) 21:37, 21 September 2011 (UTC)[reply]
FORTRAN?​—msh210 (talk) 00:05, 22 September 2011 (UTC)[reply]
I posted a comment related to this issue at the end of Wiktionary:Beer_parlour#Classical.2FLiterary_Chinese_entries. -- A-cai 01:09, 12 October 2011 (UTC)[reply]

Language merges

Looking at this page and WT:RFDO, there are four merges being proposed:

  1. Category:Koongo language into Category:Kongo language (very small)
  2. Category:Colloquial Malay language into Category:Malay language (very small)
  3. Category:Filipino language into Category:Tagalog language
  4. Category:Hindi language and Category:Urdu language into a Category:Hindustani language

Of course, Bosnian, Croatian and Serbian were merged a few months ago

On top of that, I would personally like to see Category:Anglo-Norman language merged into Category:Old French language (which I might add, would render a few hundred of my own edits useless or worse). Interesting issue, isn't it? Mglovesfun (talk) 20:08, 21 September 2011 (UTC)[reply]

At the risk of turning this into a very, very broad topic, I've occasionally wondered if having all the 'Arabics' separately is appropriate. Mglovesfun (talk) 20:10, 21 September 2011 (UTC)[reply]
The Arabic languages are almost as distinct as the Slavic languages. They share a formal standard literary/media language but the languages of daily speech are so different as to be incomprehensible to people at the other end of the Arabic language area. —CodeCat 21:37, 21 September 2011 (UTC)[reply]
That's correct. However, it doesn't make sense to create Egyptian, Levantine, Moroccan, etc. entries for words which are identical. Most formal vocabulary and many other words are shared between dialects and MSA or have a very slight difference in the pronunciation. We don't use the pedantic case endings here, anyway (e.g. غرفة ghurfa vs ghurfatun) . The difference in pronunciation between j/g, q/' (Standard/Egyptian) could be ignored, since the spelling is the same, the conversion is rather consistent but Egyptians pronounced Arabic words differently. So MSA قلم qalam is Egyptian 'alam and MSA حج Hajj is Egyptian Hagg. The words, which ARE different in dialects should have separate entries, IMHO, e.g. tomorrow غدًا "ghádan" (MSA) vs بكره "bukra" (Egyptian). --Anatoli 23:24, 21 September 2011 (UTC)[reply]
I know no Arabic so can't opine, but, assuming what Atitarev says is true, pronunciations differences can be relegated to the Pronunciation section.​—msh210 (talk) 00:07, 22 September 2011 (UTC)[reply]
The contributors in Arabic dialects have almost died out. I looked at some Egyptian Arabic nouns, many (not all) are just Arabic. The quality Egyptian Arabic entries with different plural forms shouln't be merged, like يد. We should also check with Stephen G. Brown and Dick Laurent on this. --Anatoli 02:47, 22 September 2011 (UTC)[reply]
Look at water. The translations in Arabic dialects are all quite different from each other. -- Liliana 03:25, 22 September 2011 (UTC)[reply]
Using this same example, if one looks at the translations in Chinese, they are written in the same way (Dungan excluded of course), and yet we divide Chinese into God knows how many languages. 60.240.101.246 12:49, 22 September 2011 (UTC)[reply]
Many of these are difficult to discuss, because we don't have very many people who specialize in foreign languages. Therefore, it's hard to gain any sort of consensus. -- Liliana 03:38, 22 September 2011 (UTC)[reply]
(Edit conflict). That's the trouble with most common words, they are very few but make the speech very distinct and hard to understand with no previous exposure. The same will be for the question word what. Still the written dialects (if they are written, only a few are ever written down) tend to be much closer than the spoken forms, much closer than Slavic languages. The fact that dialects are for speaking not for writing make making entries for them less important.
Agree to your last message. --Anatoli 03:40, 22 September 2011 (UTC)[reply]

Yes, it may make sense to create distinct entries for words in different Arabic languages. I would allow both the macro-language (for those not willing to create distinct entries) and individual languages (for those seeing an interest in creating them). The fact that specific words are often mostly oral and not found in usual dictionaries makes all the more important to include them here.

In addition, systematically accepting sections for languages with an ISO code would be a simple rule, would make things much simpler and would avoid many discussions. Lmaltier 16:45, 22 September 2011 (UTC)[reply]

I dislike putting simplicity ahead of accuracy. ISO 639 isn't really designed for our purposes; they don't care if a language is actually not a language but a dialect of another language, they just attribute a code when a code can be useful. There's a code for no linguistic content but I don't think we want Category:No linguistic content language. Mglovesfun (talk) 20:59, 25 September 2011 (UTC)[reply]
No, they try to define codes for languages, not for dialects. They created a code for no linguistic contents butthis is an exception, they don't state that this is a language. It's not always obvious. They created Occitan as a macrolanguage, and codes for individual varieties, then they changed their mind. And the word language may be interpreted in different ways, so you may disagree with their decisions. But this is what they try to do. Lmaltier 09:49, 2 October 2011 (UTC)[reply]
Jesus. Arabic. So, on one hand, there are things that would be pretty smart about having separate L2s for the major dialects, but as has been pointed out there would be lots of overlap. However when you get to the details, you have variations in pronunciation, verb conjugation... these would have to be compensated for if we wanted to be complete. Without separate L2s the only logical way to represent variations in regional conjugation in an L2 Conjugation section is with several drop-down tables. We'd have a few {{a}} tags for pronunciation variance, stuff like that. Really it would be possible to treat all Arabic dialects under one header. In all likelihood, it wouldn't be pretty, and it would require a lot of tags - for example, for words specific to certain dialects, it would be very easy to just do like we do with English with regional tags before definitions. (I apologize for the scattered nature of these statements lol... there's a lot to consider.) — [Ric Laurent]22:57, 25 September 2011 (UTC)[reply]

Hey! If you're going to merge Hindi and Urdu (see above), then arguably Romanian and Moldovan are also candidates for merging! -- Liliana 13:54, 28 September 2011 (UTC)[reply]

Yes, let's add Category:Moldavian language→Romanian to the list of merges to consider. - -sche (discuss) 07:28, 30 September 2011 (UTC)[reply]
Definitely! --JorisvS 12:43, 30 September 2011 (UTC)[reply]
Naturally — [Ric Laurent]23:45, 5 October 2011 (UTC)[reply]
MRL!...Exista o singură (daco-romană) limbă în România și Republica Moldova: Romanian language. Este bine să se poarte o discuție aici, dar rezultatul trebuie să fie clar: o (una) singură limbă (oficial "dacoromană", Romanian ) pentru spațiul carpato-dunărean al României și R. Moldova. Un argument clar și decisiv pentru Romanian language: Românii din provincia Moldova (din România, Vest-Moldova), - români-moldoveni - vorbesc aceeași limbă (language) ca și românii din Est-Moldova (Republica Moldova): Romanian, și ei nu afirmă că "oficial" limba lor se cheamă "moldovenească" (moldavian)!

Aside from merging "Colloquial Malay" into "Malay", for consistency and accuracy "Indonesian" and "Malaysian" should also be merged into "Malay", because these are standardized varieties of the Malay language (like Croatian etc. are of Serbo-Croatian and Hindi and Urdu of Hindustani/Hindi-Urdu). On the other hand I'd like to point out that there are several "Malay languages" that should not be merged (do we have any entries?). --JorisvS 12:43, 30 September 2011 (UTC)[reply]

Category:Banjarese language comes to my mind. But yeah, Standard Malay and Standard Indonesian are virtually the same language, so there isn't really a need to have them separately. -- Liliana 11:54, 1 October 2011 (UTC)[reply]
I think the way we handle Spanish would be appropriate in a lot of these cases, one unified language header and then context tags for meanings which are distinct to a region. Spanish has regions which conjugate differently, pronounce differently, use significantly different vocabulary, but we manage to represent all of these things without too much confusion. Obviously I don't know enough about the particular languages brought up here, but we do have a model for how we can make this work. - [The]DaveRoss 12:31, 1 October 2011 (UTC)[reply]

Non-idiomatic translations

As a side-thought to the Idiomatic translations section above, what is the preferred method of handling translations out of English that are non-idiomatic phrases in the target language? I'm thinking now of disarm, where the Japanese translation of the intransitive sense "to lay down arms" could be 武器を捨てる, which is redlinked here as it should be, and which is not included in any other dictionary due to the same SOP restriction we have here. That said, is it kosher in translation tables to only use the {{t}} template for parts of a translated phrase?

Instead of:

{{t|ja|武器を捨てる|tr=ぶきをすてる, buki o suteru|sc=Jpan}}

should we have the following?

{{t|ja|武器|tr=ぶき, buki|sc=Jpan}}{{t|ja|を|tr=o|sc=Jpan}}{{t|ja|捨てる|tr=すてる, suteru|sc=Jpan}}

This is so incredibly ugly and unwieldy that I'm pretty sure it's not the way to go, but that brings me right back to the question -- does anyone have advice on how to input non-idiomatic phrases as translations of English terms? -- Eiríkr Útlendi | Tala við mig 20:26, 22 September 2011 (UTC)[reply]

Yeah, I've wondered about that, too. Probably it's better to just use {{l}}/{{onym}} for those:
* Japanese: {{onym|ja||[[武器]][[]][[捨てる]]|tr=ぶきをすてる, buki o suteru}}
(It's not ideal — {{onym}}, unlike the {{t}} family, italicizes its transliterations — but I think we've more than used up the sensibly mnemonic t template-names. What are we gonna create, a {{t:}}?)
RuakhTALK 20:41, 22 September 2011 (UTC)[reply]
Cool, thanks for the feedback. I think I'll use more manual formatting, since {{onym}} is apparently deprecated and since it italicizes the kana. I thought about adding a lang or sc param, but it looks like these aren't implemented for {{onym}}, so there you go. So would the below be acceptable?
* Japanese: {{Jpan|武器捨てる}} ({{Jpan|ぶきをすてる}}, buki o suteru)
It looks like the font used in translation tables is smaller, so maybe I shouldn't use {{Jpan}} either? -- Eiríkr Útlendi | Tala við mig 21:26, 22 September 2011 (UTC)[reply]
I don't think {{onym}} is deprecated. I don't know what you mean by "it looks like [lang and sc params] aren't implemented for {{onym}}"; but manual formatting seems just fine to me. (As does using {{Jpan}}.) —RuakhTALK 22:13, 22 September 2011 (UTC)[reply]
Cheers, thanks. {{onym}} is marked RFDO, which I discovered when I looked at the template page itself to figure out about params. Some templates like {{term}} have a lang param for specifying a language, and the template handles formatting differently in some cases for specific languages, like using a slightly bigger font and no italics for Japanese. The sc param shows up in some other templates as a way to specify a certain script, again so the template can select an appropriate font size and style. {{onym}} doesn't change its Japanese output when I add either lang=ja or sc=Jpan. Japanese being the odd duck that it is, typographically speaking, I may have run across some of these wrinkles more than folks working with European languages. :) -- Ta, Eiríkr Útlendi | Tala við mig 22:33, 22 September 2011 (UTC)[reply]
{{onym}} doesn't have lang=ja because it just has ja, as in the example wiki-text above; that is, {{onym|ja|foo|tr=bar}} is like {{term|foo|lang=ja|tr=bar}}. It does support sc=Jpan, theoretically, but Jpan is already the default script for Japanese, so {{onym|ja|foo|tr=bar}} implies {{onym|ja|foo|sc=Jpan|tr=bar}} anyway. Neither version applies the script template to the transliteration, though. (And {{t}} doesn't, either.) —RuakhTALK 03:34, 23 September 2011 (UTC)[reply]
Why not create a {{t-SOP}}? Matthias Buchmeier 09:09, 23 September 2011 (UTC)[reply]

Serbo/Croatian

There is NO Serbo/Croatian language. These are two languages: Srbian and Croatian. So please treat them in that way.

Why? —CodeCat 09:33, 25 September 2011 (UTC)[reply]

re-e... or ree...

I'm currently adding some Italian words that start rie... - they mostly translate as English words starting ree... - I never know whether to use ree... or re-e... for the English. (See riesposizione as an example. We seem to use both forms (sometimes as alternative forms). Is there any sort of rule? SemperBlotto 10:00, 25 September 2011 (UTC)[reply]

In my experience, both forms usually exist. Some writers don't like the fact that it looks like a single ee vowel. Same thing as with cooperate. Equinox 10:10, 25 September 2011 (UTC)[reply]
The New Yorker would doubtless use reë.... Older works, too, for older words. So older words' reë... versions are (most of them) probably attested.​—msh210 (talk) 15:15, 25 September 2011 (UTC)[reply]

Target audience

In reaction to comments such as Gtroy (talkcontribs) on WT:RFD#Cyrillic script saying "keep both are really common, a new speaker or child would find it useful." This calls into question our target audience. WT:CFI#Idiomaticity alludes to the same thing in saying "An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components." How easy it is to derive the full meaning depends on how good you are at deriving meanings. What level should we aim at? Educated adults, children, non-native speakers? The problem I have with aiming for non-native speakers (and above, naturally) is how far do we want to go to accommodate very weak English speakers. Someone without much experience in English may find I laughed like a child difficult to understand, but I doubt we would include that. Reading WT:FEED, many of the users seem to be unable to spell even very basic English words right, so maybe dumbing down is a good option. Mglovesfun (talk) 20:51, 25 September 2011 (UTC)[reply]

I don't think it harms us any to aim at the lower threshold. First, who is likely to be looking up something like target audience in a dictionary unless they don't know what it means and need help figuring it out? Second, what possible harm could come from having the entry? So long as we guard against becoming shills for people who are trying to market a product or make Urban Dictionary style humorous coinages, I think we should be very liberal about allowing combinations like that. bd2412 T 23:27, 25 September 2011 (UTC)[reply]
I actually doubt that [[Cyrillic script]] would be useful to a child or a new speaker, because neither group would know that anyone might consider "Cyrillic script" to be a single unit to look up. I think it's more likely to be useful to an adolescent or adult native speaker who already recognizes that "Cyrillic script" is a common phrase, and wants more information about it, but doesn't quite understand how to use a dictionary, or doesn't quite understand the difference between a dictionary and an encyclopedia. (It could be useful to translators, who would likely prefer to find the usual translation for "Cyrillic script" rather than have to assemble the translations for the relevant senses of "Cyrillic" and "script", since — for example — their target language might use different words for alphabetic scripts as for other kinds of scripts.) —RuakhTALK 23:43, 25 September 2011 (UTC)[reply]
I think it's not bad, but it can be dangerous. Wiktionary is already very big, and that means a lot of pages will be created and only visited once or twice after that, if that at all. There is very little opportunity for review, which could mean a very obscure term might have a bad entry for a very long time. Wiktionary is not paper, but its users aren't omnipresent either... —CodeCat 23:46, 25 September 2011 (UTC)[reply]
The amount of attention we give to such phrases in RFV/RFD discussions suggests that they can be adequately spotted and brought up to our standards. bd2412 T 23:51, 25 September 2011 (UTC)[reply]
That assumes that someone first RFDs or RFVs them, which might take a long time for an entry hardly anyone ever needs. —CodeCat 23:55, 25 September 2011 (UTC)[reply]
That could be said of an enormous number of our entries. Consider all the conjugations of verbs, some into forms that are almost never used or called for. bd2412 T 01:32, 26 September 2011 (UTC)[reply]
That's true, but I'm wondering if we should make it even worse. Verb form entries are usually bot generated, so they are correct as long as the bot was written correctly. —CodeCat 10:32, 26 September 2011 (UTC)[reply]
The Simple English Wiktionary is small, but it is specifically aimed at people with lower levels of English proficiency.--Brett 11:39, 26 September 2011 (UTC)[reply]
"Wiktionary is very big" ... really? I would say Wiktionary is very small, compared to the size required to adequately cover its subject matter. The scope of Wiktionary is not "all words in all languages as long as the word will get more than one or two hits from direct search. It is important to remember that only a part (and a small part I would guess) of the usage of Wiktionary's information is in direct search. Indirect search (onelook, Google, Yahoo etc.) and the many places Wiktionary has been parsed and culled for particular relationships probably get a lot more attention, and we have no idea what those users really want or need. - [The]DaveRoss 13:39, 27 September 2011 (UTC)[reply]

Completing the projects of User:Robert Ullmann

I can't think of a better way to honor Robert Ullmann's work on this project than to complete the many unfinished projects that he initiated. Many of us have begun projects for the introduction of substantial bodies of material into this dictionary, and any one of us could die with our work unfinished. I'd like to think that if that happens to one of us, the rest will pick up that work and carry it to its completion. Robert's work is epitomized in his many subpages - Robert Ullmann subpages, more Robert Ullmann subpages. Of these, I think the projects where we as a community can really pull through are User:Robert Ullmann/Missing, User:Robert Ullmann/Oldest redlinks, and the various pages showing use of foreign language words in news articles. Let's do this. Cheers! bd2412 T 23:47, 25 September 2011 (UTC)[reply]

I would revert his user page to this revision, from which some of the projects including "Oldest redlinks" and "Missing" are accessible. The text "Robert Ullmann passed away on March 19, 2011 in Massachusetts General Hospital in Boston at an age of 50 years. [1]", which is currently the only contents of his user page, could be placed into an infobox at the top of his original page. --Dan Polansky 06:25, 26 September 2011 (UTC)[reply]
Some user lead projects (including my own) are pretty massive, and may not be finished for years, maybe 10 years or more. Would be nice to continue to make progress however. Mglovesfun (talk) 06:49, 26 September 2011 (UTC)[reply]
I think this is a good idea. An interesting thing about your two primary examples is that neither one will ever be finished, unless we somehow finish the entire project. Are you proposing that we finish the lists that Robert generated or ought we update those lists from the most current dump and progress from there? Also I think reverting his userpage and putting a note at the top is appropriate. - [The]DaveRoss 11:26, 26 September 2011 (UTC)[reply]
OK. I shall have a go at the Italian ones. Also, since I'm probably the most aged regular contributor, I shall try to document my future projects in case I also fall off my perch in the next few years. SemperBlotto 11:37, 26 September 2011 (UTC)[reply]
While there will always be "oldest" redlinks and missing pages, my thought was to prioritize the last lists that Robert generated. Of course, the dynamic nature of language insures that we'll never "finish" the project, but we will get it much closer to the leading edge of being a complete lexicon. I also agree with restoring a working version of his userpage, with the current note retained on it. Cheers! bd2412 T 12:51, 26 September 2011 (UTC)[reply]
We could also add Category:Tbot entries to the list of Robert Ullmann projects. --Mglovesfun (talk) 16:47, 26 September 2011 (UTC)[reply]
It is funny you should say that, because I got an e-mail from WF yesterday urging me to propose a project to continue Blotto's work after his death. This seemed like an obvious troll to get me to upset everyone by mentioning the death of somebody present, and anyway I hate community huggy-kissy stuff, so I let it go. And then within 12 hours I read SB talking about falling off his perch! My main concern is whether falling off one's perch is an idiom and whether we should have an entry. And secondarily who is going to learn Italian and fix all the vandalism. (And seriously: yes, RU had some good stuff and it would be a pity for nobody to continue with it.) Equinox 20:43, 26 September 2011 (UTC)[reply]
I restored the userpage content as some have suggested, and I also thought it was a good idea. Also, the link to the obituary is broken. —Internoob 18:18, 26 September 2011 (UTC)[reply]
I have added a working obit link. bd2412 T 17:44, 27 September 2011 (UTC)[reply]
Thanks. One more, more informative, obituary: http://www.obitsforlife.com/obituary/344565/Ullmann-Robert.php. See also W:Wikipedia:Deceased_Wikipedians#Robert_Ullmann_.28Robert_Ullmann.29. An image such as File:Nuvola grave.png or File:Nuvola grave with cross2.png could be placed to the box on his user page, as {{notice|image=Nuvola grave with cross2.png|...}}. (I cannot edit his user page.) --Dan Polansky 07:18, 30 September 2011 (UTC)[reply]
Thanks, I've switched out the obit link and made it a running text link instead of a note. I think it looks nicer and is easier to find. I've also chipped away a tiny bit at his missing entries pages. Cheers! bd2412 T 03:29, 14 October 2011 (UTC)[reply]

Adjective+noun entries.

There are many adjectives that only have a particular meaning when modifying a noun with a particular sense (or a hyponym thereof). For example, there is a sense of prime that applies only to natural numbers, so you get phrase like "prime integer", "prime and composite numbers", "numbers that are prime", and so on, but in particular you get the phrase "prime number", which passed RFD. Other similar cases — adjective-noun pairs that are SOP, but only because the adjective has a noun-specific sense — include "vintage car", "active volcano", "acute/obtuse/right/reflex/round angle", "exploitative competition", "oblique leaf", "Cyrillic alphabet/script", and others. (Actually, some of the "angle" ones are debatable; no one produced any evidence that "round", for example, is used to mean "360°" outside the one phrase "round angle".) All of these entries were created, but all came to RFD, and few were kept. (A few still appear at RFD.) All of the discussions were fairly ad hoc; although various arguments were presented for keeping or deleting specific entries, no really general principles were proposed, and the result of a given discussion generally seems to have depended largely on who participated in the discussion.

I'd like to raise this question more generally. I would posit that no single editor agrees with the result of every single one of the above discussions. Which ones do you agree or disagree with, and why? Should we have kept all of them? Deleted all of them? What criteria should we have applied? Should we strive for some sort of consistency on this, or are ad hoc discussions the way to go?

RuakhTALK 00:39, 26 September 2011 (UTC)[reply]

  • I think we should generally keep these, as they may be useful to someone looking up the term who does not know, for example, which meaning of "acute" and which meaning of "angle" are likely to be implied by reference to an "acute angle". This is a voluntary project; if some Wiktionarians want to make such entries, and the meanings provided can be backed up with sources (most of the above are clearly in widespread use), then others who don't care for them should not spend time making them, and should instead focus on adding the many words still missing from the lexicon. bd2412 T 01:37, 26 September 2011 (UTC)[reply]
  • If "number that is prime", "a number is prime", "prime, large number", "prime large number", "very prime number", "more prime a number", or "prime integer" exists, that's enough IMO to say the adjective is separable enough from the noun to delete the "prime number" entry. People shouldn't see the phrase's entry and think the adjective's tied to the noun. However, if the above phrases, and others like them, don't exist, and the only variant on "prime number" that exists is "prime or composite number" or "large, prime number" (with a comma), then I don't know (but am tending at the moment to want to keep in the former case, as the "or" doesn't really break up the phrasiness, and delete in the latter, as the "large" shows that "prime" is just an adjective). If, on the third hand, the only attested variant on "prime number" is "prime effin' number" or "large prime number" (no comma), then I'd say keep "prime number".​—msh210 (talk) 03:50, 26 September 2011 (UTC)[reply]
  • I am in favour of these (or the very great majority) being allowed (as some of our users will expect them to be here). But, as bd2412 says, we shouldn't go out of our way to create them. — This unsigned comment was added by SemperBlotto (talkcontribs) at 03:41, 26 September 2011 (UTC).[reply]
  • I agree with msh210 on this. DCDuring TALK 11:00, 26 September 2011 (UTC)[reply]
msh210 has more or less nailed it. I am thoroughly in favour of specialist adjective definitions like "(of a [noun class]) [its meaning in that context]". A pet hate of mine is palindromic prime. Equinox 20:33, 26 September 2011 (UTC)[reply]
IMHO "palindromic prime" could be deleted as sum of parts, and is unlike "prime number", in that the meaning of "palindromic" used in the phrase is not specific to primes. In fact, the meaning of "palindromic" relates to strings over an alphabet. Thus, a number (prime or not) can be palindromic only with respect to a particular system of encoding, such as decadic, binary, or using Roman numerals. --Dan Polansky 08:22, 28 September 2011 (UTC)[reply]
If the adjective has a noun-specific sense, this is a very strong clue that the adjective + noun phrase is a set phrase belonging to the vocabulary of English. But this is only a clue, this should not be the criterion. It seems obvious to me that prime number belongs to the vocabulary of the English language, that this is a mathematical term (while blue bicycle doesn't belong to the vocabulary of English), and this is the reason why prime number must be included. I also agree with Equinox, but this is not a reason to exclude prime number. Lmaltier 20:45, 26 September 2011 (UTC)[reply]
How do you feel about entries for prime integer, prime member (of a set), etc.? These are equally natural for mathematicians. I don't know whether they are "set phrases" in English but it's hard to see how they are different from "prime number" since integer and member both refer to numbers. Equinox 20:50, 26 September 2011 (UTC)[reply]
Are they really equally natural for mathematicians? This was not my feeling, because it's not the case in French (nombre premier must be considered as a word, not entier premier). I feel that prime member is built by the brain when needed (prime + member) while prime number is already available as a whole in the brain, and this is the reason why this is a word. Lmaltier 05:25, 27 September 2011 (UTC)[reply]
I suspect that you're right, but I think it would be nice to have somewhat more objective criteria, no? —RuakhTALK 11:51, 27 September 2011 (UTC)[reply]
In many cases, it's obvious from one's intimate knowledge of the language (reasoning doesn't help). For specialized terms, it may be less obvious if you are not a specialist. In such cases, I think that we must trust specialists and that, whenever the phrase is defined in a specialized lexicon, it must also be accepted here: if specialists find that a definition is needed, a definition is needed here too. Note that my Pocket Oxford Dictionary (printed in 1972) defines prime number... Lmaltier 16:44, 27 September 2011 (UTC)[reply]
  • My tentative principle is this: If a phrase "<adjective> <noun>" is such that (a) the meaning of <adjective> used in that phrase is specific to things referred to by <noun>, and (b) "<adjective> <noun>" is much more often used written together as a phrase rather than separately as in "<noun> is <adjective>", then (c) we should have an entry for "<adjective> <noun>", regardless of (d) there being a suitable definition in the <adjective> entry that makes "<adjective> <noun>" a sum of parts. Examples include algebraic number, algebraic integer, bound variable, cardinal number, complex number, free variable, imaginary number, rational number, real number, transcendental number, free software, open set, closed set, complete graph, normal distribution; see also talk:free variable. I am not sure I require (b) to hold; (a) is the crucial part of the condition of the principle. As to the rationale, I tend to store such terms under "<adjective> <noun>" in my mind, and I estimate this is also the headword under which people tend to look these things up. In German, I store "vorstellen" under "vorstellen" in spite of its also ocurring in the separate position as in "stell dich noch mal vor". Thus, I deem this approach convenient for the users of the dictionary. --Dan Polansky 08:45, 28 September 2011 (UTC)[reply]

I think this is a very necessary discussion, but the problem is that I'm not sure there are really objective criteria that can be brought to bear. A lot of this is to do with subjective feelings from native speakers about the extent to which a given phrase ‘feels’ like a set unit. The tests that Msh210 mentions are definitely suggestive but not, I think, definitive. That is why the RFD discussions are a good way of settling it and why they will probably always be needed. Personally I find some terms are semantically transparent but still feel like individual lexical units (like the late lamented (deprecated template usage) downloadable content), whereas other terms apparently meet our CFI but to me do not appear idiomatic or natural at all (like (deprecated template usage) Egyptian pyramid). Another point I want to make is about the usefulness of these entries, which is often called into question. The point of them is not to answer the question ‘what does XY mean?’ but rather ‘do native speakers of this language actually use the term XY?’. A good dictionary should be able to say: yes, and here are citations proving it, and preferably some indication of when it was first used. Ƿidsiþ 06:44, 30 September 2011 (UTC)[reply]

A small idea for formatting discussions

I noticed some people use bullets with * instead of indenting the text with : . I think this is a lot clearer because you can easily see when the next message begins, even if they both have the same indenting level. Do you think we could make this general practice on Wiktionary, maybe? —CodeCat 11:59, 26 September 2011 (UTC)[reply]

This would get confusing if somebody actually posted a bulleted list. Equinox 12:28, 26 September 2011 (UTC)[reply]
True, but sometimes people post blockquotes also:
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
I'm not sure bullets are worse in that regard.​—msh210 (talk) 16:41, 26 September 2011 (UTC)[reply]
Okay, to rephrase: I don't think the current discussion formatting is perfect, but I think we need to distinguish, codewise, "semantic" bullets (intended to serve as bullets) from "pragmatic" bullets (just happen to be useful). Cf. the difference between BOLD and STRONG elements in HTML. Equinox 19:26, 26 September 2011 (UTC)[reply]
Maybe a special template could be used instead of a bullet or indending, when used in discussions? Something like {{*}}? —CodeCat 20:28, 26 September 2011 (UTC)[reply]
Actually, I have a better idea. Look at my monobook css and js files and you'll find a bunch of code which colors and indents talk pages to make them much more readable. It should be a cinch to make this work with pages such as the BP. -- Liliana 20:34, 26 September 2011 (UTC)[reply]

JA translations suddenly all borked

I just noticed that Japanese translations are very badly borked, suggesting that someone this morning (Pacific Time in the US) has either inadvertently broken or vandalized something somewhere. For reference, have a look at move#Translations, fill#Translations, and anything else that has Japanese included in the translation table.

Does anyone know where to look for sudden changes? Neither {{trans-top}} nor {{t}} have been changed today, and I'm not sure where else to search for such mistakes or foul play. -- Eiríkr Útlendi | Tala við mig 17:57, 27 September 2011 (UTC)[reply]

I just noticed that the translation table headers are normal when the page is first loading, and only go funny when the translations JavaScript gets applied. Alternately, loading the page with JS disabled shows things as they should appear. I have no idea who maintains this script -- who should we talk to about this? -- Eiríkr Útlendi | Tala við mig 18:01, 27 September 2011 (UTC)[reply]
I can't reproduce that. What sort of b0rkage do you see? Does a "hard refresh" (Ctrl+F5) help? —RuakhTALK 18:21, 27 September 2011 (UTC)[reply]
Sorry, I should have been more specific. The header lines of translation tables now include the Japanese entry as well. So for (deprecated template usage) move, the header for the first translation table should be:
to change place or posture; to go
Instead, the Japanese gets munged into the header, producing:
to change place or posture; to go - Japanese: 動く (ja) (うごく, ugoku)
Note that the Japanese still appears within the table where expected. Hard-refreshing doesn't help, and this issue is present on pages I've never looked up before (and thus wouldn't be in my cache), like at procrastinate#Translations.
In addition, attempting to use the assistance JS dialogs to add a translation directly would fail out earlier this morning when attempting to add Japanese, giving an error message, but this seems to be working now. Is someone editing these scripts? That's certainly what it looks like... -- Eiríkr Útlendi | Tala við mig 18:43, 27 September 2011 (UTC)[reply]
I think that's not a bug, but a feature: do you see the little "Select targeted languages" link inside each translations-table? —RuakhTALK 18:56, 27 September 2011 (UTC)[reply]
Now I feel stupid.  :) In my defense, I only clicked that because the translation edit assist was acting strangely on the (deprecated template usage) full page, where hitting the Preview translations button would throw up errors about "Japanese translation [term] not found" (of course it's not found, I'm trying to add it...). I figured the two behaviors were linked, but it seems the only link was myself.  :-/
FWIW, the translation edit assist seems to be working now, so that's all good. -- Eiríkr Útlendi | Tala við mig 21:23, 27 September 2011 (UTC)[reply]

{nonstandard, rare} form of

Could somebody please make these templates:

{{nonstandard form of}}

{{rare form of}} --Pilcrow 01:16, 28 September 2011 (UTC)[reply]

Moved from WT:ID JamesjiaoTC 01:20, 28 September 2011 (UTC)[reply]
A good tip is to find a template that does a similar job, copy its contents and adapt it. So {{nonstandard spelling of}} might be a good choice here. Mglovesfun (talk) 19:57, 29 September 2011 (UTC)[reply]

Why is it deleted? Cite: Google Books 2.25.211.161 13:08, 28 September 2011 (UTC)[reply]

It seems attestable to me, albeit rare. But since I don't know Mandarin, I'd like to have a second opinion. -- Liliana 13:17, 28 September 2011 (UTC)[reply]
For your reference: Nouns 2.5 - Personal and place names not in the Chinese Han language 2.25.211.161 16:05, 28 September 2011 (UTC)[reply]
See Talk:Ampere定律 and Special:WhatLinksHere/Talk:Ampere定律. Mixed-script entries are deleted if they aren't cited. On the other hand, they are allowed, if cited. - -sche (discuss) 18:33, 28 September 2011 (UTC)[reply]
If you want to learn Chinese, learn the proper script. Otherwise, don't bother. Don't go around and tell others that their script is shit and advocate that they should use blah blah blah instead, like these people. 60.240.101.246 23:20, 28 September 2011 (UTC)[reply]
I deleted it. Use 泰晤士河 or 泰晤士. Proper names are transliterated into Mandarin using Chinese characters, most foreign names in Roman letetrs can be attested in a Mandarin because not everyone knows how to write it in Hanzi or can't be bothered. --Anatoli 00:07, 29 September 2011 (UTC)[reply]
To anon 2.25.211.161. Don't go creating English words with the Mandarin heading, this is wrong. --Anatoli 00:12, 29 September 2011 (UTC)[reply]
Discussion continued at Wiktionary:Requests_for_deletion#Thames.E6.B2.B3. - -sche (discuss) 02:51, 29 September 2011 (UTC)[reply]
We have a more serious matter at hand here than just this entry. We should not allow Chinglish to be spread, no matter if it's attestable or not. Chinese often use foreign words in a Chinese context (not to be confused with mixed script words) but I repeat, these words don't become Chinese if they are used in a Chinese context. --Anatoli 04:23, 29 September 2011 (UTC)[reply]
Could someone help me to set up a vote on this (Banning foreign proper nouns as Mandarin). --Anatoli 05:21, 29 September 2011 (UTC)[reply]
I didn't realize Engirst had already created a thread here. Anyway, I have commented on the talk page on Wiktionary_talk:About_Sinitic_languages. — This unsigned comment was added by Jamesjiao (talkcontribs).

A general comment: we should not create Chinese sections only for pure Chinese words, but for all words used in Chinese (and not only mentioned, this is very important). This rule applies to all languages. Take a word such as autoroute: can this word really be considered as an English word? Yet, it deserves an English section. Creating a section for a language does not mean that the word fully and naturally belongs to the language, only that it is used in the language. Lmaltier 05:34, 29 September 2011 (UTC)[reply]

What you're offering for Chinese is quite dangerous. In fact, any foreign proper name can be used in Mandarin in the Roman script. Your example is a borrowing, which can happen in any language. Thames河 is not a borrowing, it's not common and is just an example of a person not willing to write this word in Chinese. As much as many users want to see Chinese switch to Roman, this is not happening and we shouldn't promote it. --Anatoli 05:43, 29 September 2011 (UTC)[reply]
Actually, it's also a borrowing (most such proper nouns are borrowings). What you contest is only the way it's written. We should not promote anything, only describe the language as we find it is written, with appropriate comments and explanations (uncommon, etc.) Lmaltier 06:03, 29 September 2011 (UTC)[reply]
If a word is normally written in one script and transcription, and someone uses that word from its original script, I don't think that counts as borrowing into the main language of the text. If I decide to use 这个单词 instead of "this word", that doesn't turn 这个单词 into English. This is how most (all?) alphabetically spelled words are used in Chinese and Japanese -- specifically as foreign words. Sure, they're being used in a Chinese or Japanese context, but that doesn't make these words Chinese or Japanese. -- Eiríkr Útlendi | Tala við mig 17:39, 29 September 2011 (UTC)[reply]
The difference is that English speakers are generally not expected to be able to read Chinese characters. I imagine that Chinese speakers on the other hand have at least some understanding of the Latin script used to write English. The same kind of mixing of scripts is done elsewhere too, like in Russian, Greek or Arabic. Just as a language such as English is often expected to be known to non-native speakers, in the same way the Latin script name 'Thames' may be expected to be known in China, while the reverse is not true. —CodeCat 18:18, 29 September 2011 (UTC)[reply]
Yet is there anything intrinsically Chinese about using (deprecated template usage) Thames in a Chinese text? If an alphabetically written word used in Chinese contexts takes on a specifically Chinese meaning, then I would be open to the idea of categorizing it as Chinese. If it never has anything but its original meaning from the source language, such as when it is only ever used as a disambig, then no, I would say that it is still decidedly not Chinese, in part as the main reason it's being used is precisely *because* it's not Chinese.
And as a side note, the times that I've seen alphabetic text used in Japanese (the non-English language that I read the most), it is again used precisely because it is not Japanese. In cases such as placenames, the non-Japanese rendering is given generally in parentheses, and is provided not necessarily because the expected audience should know it, but more to clarify the original spelling should a reader want to look into things more, such as here or here. -- Eiríkr Útlendi | Tala við mig 19:46, 29 September 2011 (UTC)[reply]
that doesn't make these words Chinese or Japanese: nobody thinks that this makes them Chinese words. Nonetheless, if they are used in Chinese, a Chinese section is useful. highway is not a French word, but a French section would be helpful nonetheless (for sense, gender, pronunciation, etc.), because it's used in French (as a foreign word, but used nonetheless). In the case of a foreign word such as psychanalyste mentioned in a sentence such as The French word for psychanalyst is psychanalyste., it's very different, the word is not used in the sentence. Lmaltier 19:25, 29 September 2011 (UTC)[reply]
If there is no argument about whether these words are Chinese or Japanese, then what is the argument? Foreign words belong under their respective headings. (deprecated template usage) vis-à-vis is listed as English because it's been accepted into the English language, and is used enough in purely English contexts that its meaning is diverging from the French meaning over time. Likewise with terms like (deprecated template usage) al fresco or (deprecated template usage) honcho -- they came into English as foreign terms, but have since taken on specifically English senses. I would strongly argue that (deprecated template usage) Thames has no such specifically Chinese sense -- and, thus, does not belong under a Chinese header. -- Eiríkr Útlendi | Tala við mig 19:46, 29 September 2011 (UTC)[reply]
Acceptance of a word in a language is something subjective. Use of a word in a language is something objective. The meaning is irrelevant. Lmaltier 05:29, 30 September 2011 (UTC)[reply]
If we don't stop this madness, Mandarin Wiktionary space will be full of - (river name)河, (city name)市, (disease name)病, (mountain name)山, (island name)岛, etc! They are attestable all right but they are not Mandarin. Then any serious person will doubt the quality of this dictionary. --Anatoli 23:07, 29 September 2011 (UTC)[reply]
?? People read pages of interest to them, not other ones. It's better to keep simple principles and to apply them consistently. See KISS principles. Lmaltier 05:29, 30 September 2011 (UTC)[reply]
Mandarin has become the biggest language on the internet. A person with a pinyinisation agenda will dig out a couple of quotes out of thousands see 泰晤士河 in Google Books [2] just to prove his point and move his agenda, a passage where a place name is written in Roman letters. It doesn't prove anything. Not to me. Only that people who can't read Mandarin will be able to read that word. --Anatoli 05:51, 30 September 2011 (UTC)[reply]
I want to ask you, Lmatier. You seem to be very thorough about the quality of English entries, which is only commendable, but why do you disregard the opinion of editors who are active in Chinese and who may know a lot about the language and who voiced their strong opposition to these kind of entries, created by a person known for his ignoring of Wiktionary rules? Don't you think that by encouraging this you you may jeopardise your own reputation and your opinion against violators of rules set by you will not be supported in the future? For obvious reasons, I think it's only fair to have language specific policies and allowing entries Thames河 will open the door for low quality entries. --Anatoli 06:07, 30 September 2011 (UTC)[reply]
I only want to support simple, sound, easy-to-understand and consistent rules. I don't want to exclude words only because contributors think that the use of these words or of these writings should be discouraged, because it's a question of opinion (just like political opinions should not lead to the exclusion of some pages on Wikipedia). Lmaltier 17:42, 30 September 2011 (UTC)[reply]
No one is arguing that (deprecated template usage) Thames or (deprecated template usage) () are not words, and no one is trying to exclude these words -- both are already here, as clearly indicated by the blue links. The argument instead revolves around the use of two terms in two languages in a single attempted lemma entry. So far, only one IP user seems to be a strong proponent of the view that (deprecated template usage) Thames河 constitutes an integral Chinese term. Most others have been arguing along varied lines that generally converge on the points that 1) this is a generic sum-of-parts phrase, and thus has no place in Wiktionary, and that 2) (deprecated template usage) Thames is a word in English and (deprecated template usage) () is a word in Chinese, and using the two together does not constitute a new Chinese term, but is instead a prime example of w:code-switching.
Any discussion of human endeavors revolves around opinion to some degree or other. The opinion at the core of this particular issue is, are SOP terms that involve code-switching valid terms for inclusion in Wiktionary? The emerging consensus is that no, such terms do not belong here.
The main holdouts from this consensus are the aforementioned IP user, and apparently yourself. The behavior of the IP user has been quite trollish and stubbornly POV from my perspective, but I confess I have less of a handle on why you (Lmaltier) seem to be contrarian about (deprecated template usage) Thames河 and similar terms. Are you of the opinion that mixed-language code-switching SOP phrases do indeed merit inclusion as lemmata here? Are you just unfamiliar with the phenomenon of code-switching? Do you have some other strong opinion pertinent to this issue that might help elucidate your position? I'm honestly curious and I do not understand your opposition to deleting terms like (deprecated template usage) Thames河. -- Cheers, Eiríkr Útlendi | Tala við mig 18:48, 30 September 2011 (UTC)[reply]
I think that code-switching would not explain the number of attestations. My only reason is that no term in actual use should be deleted. Lmaltier 20:08, 30 September 2011 (UTC)[reply]
  1. There are only 4,580 google hits for google:"Thames河". Roughly 1,000 of these also include the (deprecated template usage) 泰晤士 official Mandarin spelling of (deprecated template usage) Thames (and, incidentally, also the spelling of (deprecated template usage) times, as in newspaper names), reducing our pool to only 3,500 at google:"Thames河"+-"泰晤士", and this is before weeding through to exclude sources that don't meet WT:CFI. Compared to the 892,000 google hits at google:"泰晤士河", it certainly looks to me like the number of attestations is actually quite small. I find only 8 hits at Google Books here, of which the first two seem to be the same book, one uses (deprecated template usage) Thames in parentheses after first using the alternate Mandarin spelling (deprecated template usage) 太晤士, one is clearly Japanese, and one offers no context or snippet at all, leaving us with only four or five books using this particular combination in a way that might meet WT:CFI. It seems to me that (deprecated template usage) Thames河 is quite rare, actually.
  2. You haven't actually addressed my question about mixed-language code-switching SOP phrases. Do you view (deprecated template usage) Thames河 as somehow not SOP? If so, why, and by what reasoning? -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)[reply]

There is no only one standard for Chinese language. Chinese is not only for Mainland China, but for Taiwan, Hong Kong, Macau, Singapore and overseas. Such as President Bush is written as 布什, 布殊 and Bush as well. 2.25.212.4 12:59, 30 September 2011 (UTC)[reply]

You already made this point at Wiktionary:Requests_for_deletion#Thames.E6.B2.B3. As I already stated there, you are being disingenuous. As others already pointed out there, Bush is not "standard" Chinese. Likewise, "Thames河" is not "standard" Chinese -- and as such, as well as for other reasons, "Thames河" does not belong here as an entry in Wiktionary. -- Eiríkr Útlendi | Tala við mig 18:53, 30 September 2011 (UTC)[reply]
What is the difference between a standard term and a non-standard term? Inclusion in existing dictionaries? And why do you think that only "standard" terms should be included? Lmaltier 20:08, 30 September 2011 (UTC)[reply]
I interpreted Engirst's argument as being that Bush is standard Chinese, under a particular standard for Chinese. That standard seems to be that any word in any script that is used in a Chinese sentence merits inclusion as a Chinese word -- a stance that I categorically refute. Using (deprecated template usage) Bush in a Chinese context does not make it a Chinese word any more than using (deprecated template usage) Москва (Moskva) or (deprecated template usage) natsukashii (natsukashii) in an English context makes these English words. -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)[reply]
Also note that mixed script terms do exist, even in English, e.g. α-particle. Lmaltier 20:36, 30 September 2011 (UTC)[reply]
See also α粒子, alpha粒子. Engirst 20:52, 30 September 2011 (UTC)[reply]
And these are clear examples of non-SOP terms -- there is no way of deriving the meaning of (deprecated template usage) α-particle etc. purely from its constituent parts. Meanwhile, (deprecated template usage) Thames河 is practically the definition of an SOP term -- the meaning is baldly plain to anyone who knows that (deprecated template usage) Thames and (deprecated template usage) () both mean. The script alone is not part of my argument.
To clarify, there are two issues here that I am arguing, both of which are against inclusion of (deprecated template usage) Thames河:
  1. (deprecated template usage) Thames河 is an SOP phrase, and as an SOP phrase, specifically as an SOP phrase with no special idiomatic meanings, it has no place here in Wiktionary. This applies equally to other non-idiomatic SOP phrases like (deprecated template usage) blue necktie, (deprecated template usage) fresh apple, or (deprecated template usage) 赤い花, where the meaning is plain from the meanings of the constituent parts.
  2. (deprecated template usage) Thames as it is used in (deprecated template usage) Thames河 has no specifically Chinese meaning -- it is being used as an English term, and therefore it is not a Chinese term, and thus should not be treated as a Chinese term. -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)[reply]
Lmaltier, this IP user (2.25.212.4) is abc123, Engirst and his multiple IP addresses he is automatically generating, avoiding all the blocks, including range blocks. He doesn't deserve to be here and was blocked multiple times by multiple administrators. The only reason he is still here is because we can't do much about him and we don't want to stop anonymous users from contributing here because of one. You are doing a disservice to by supporting his crazy ideas of pinyinisation of Chinese. He has dug a few examples of code-switching or Chinglish, which are not typical and represent nothing. Chinese people write 泰晤士河, not Thames河, which can be easily proven by checking the internet. Non-standard terms could be included if they are typical, Thames河 is not typical at all. Mandarin also has mixed script terms and they are already included here. Place names are always written in native scripts in any language. You can find all weird things on the internet if you try hard or have an agenda, that's what Engirst is doing. His last block is expired today, so he has reappeared as Engirst. --Anatoli 21:00, 30 September 2011 (UTC)[reply]
I don't know anything about Chinese, it's difficult to argue, I just try to understand. Let me take other examples.
  • I know the author of a very good, prize-winning, thesis in mineralogy. Yet, she consistently writes Abkhazia instead of the French term Abkhazie because she was unaware of this French word. Does this make Abkhazia a word worth inclusion in a French section? Certainly not, because this use was a mistake due to ignorance.
  • Now assume that she was referring instead to a Chinese province, using the Chinese characters. It could be called code-switching. The case is closer, but this assumption is absurd, because nobody would do that (because almost no reader of the thesis would understand the Chinese characters in a French text).
  • alpha particle is used, and α-particle too, because English readers are expected to understand the α character. Is this the same kind of case?
My feeling is that the case under discussion is somewhere between the 2nd and the 3rd example. Am I right?
I also feel that this writing is used by some people because 1. Chinese people might read the name of English rivers more often in English texts than in Chinese texts (??). 2. Most Chinese are expected to recognize English letters 3. This writing of foreign proper nouns is felt by some people as less ambiguous than a transcription, because closer to the original word. This is probably truer with tiny unknown rivers or unknown people when you want to refer to them in your language. When you use the same alphabet, it's not shocking to use the original word (it's even systematic), it's more shocking when you don't use the same writing system. If there is no word in the language but there is a clear transliteration system, then this system is used (e.g. in Russian), but it's not the case in Chinese. How would you translate the ru de Marivel (a very tiny French river not visible any more) to Chinese without using the Roman script?
(true or not, I don't know:) This way of writing foreign proper nouns is uncommon, but might become less and less uncommon, and this is considered as very bad by people liking their writing system (and I understand them very well).
If my feelings are not wrong, then these writings should not be promoted, but there is no reason to delete these pages, provided that required (sound, helpful and correct) information is provided (e.g. explaining why the Roman script is used, explaining that this is not standard, explaining how you pronounce it in Chinese). People not liking them may simply ignore them. Providing information to people looking for these terms is better that a message no page found.
((I'm not interested at all in whom writes something here, only in what is written.)) Lmaltier 05:59, 1 October 2011 (UTC)[reply]
"(true or not, I don't know:) This way of writing foreign proper nouns is uncommon, but might become less and less uncommon, and this is considered as very bad by people liking their writing system (and I understand them very well)." - It is definitely true because of globalization nowadays. Engirst 13:19, 1 October 2011 (UTC)[reply]
This is not code-switching or code-mixing, it's just (un)intentional reluctance to transcribe into Chinese characters. There is no way, for example, for "Thames河" or any other proper nouns written in the Latin alphabet to appear on news from Xinhua (official press agency in PRC). This is how Xinhua handles this in news: [3] It writes the name of the Assistant Secretary of Bureau of Near Eastern Affairs of the US as "杰弗里·费尔特曼" (Jiéfúlĭ Fèiĕrtèmàn), and that of the Syrian ambassador to the US as "伊马德·穆斯塔法" (Yīmădé Mùsītăfă), without providing the original scripts (English and Arabic) or transliteration of the latter. Although I have no way of knowing the original names just from looking at these transcriptions (the first one is probably Jeffrey Felt(e)(r)man(n)), these names are in Chinese, unlike "Jeffrey Feltman" which may appear (without a transcription given) in non-official Chinese news.
As for code-mixing or code-switching, it happens all the time in Singaporean Mandarin and Hong Kong Cantonese (and Chinese spoken overseas in general; also Singlish, to a lesser extent). Code-mixing doesn't make "office" Chinese or "tahi" ("poo" in Malay) English. 60.240.101.246 10:17, 1 October 2011 (UTC)[reply]
If Singdarin were written, I'd happily classify it as a pidgin or creole and record it as such. That whole section on Hong Kong Cantonese you point to is full of linguistic information that should be recorded somewhere, and is hardly English, like "yeah" meaning trendy and mouse being pronounced mau1-si2.--Prosfilaes 23:37, 1 October 2011 (UTC)[reply]
I agree with you. Engirst 00:10, 2 October 2011 (UTC)[reply]
The vote to ban this kind of entries is set up here. Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. --Anatoli 01:03, 3 October 2011 (UTC)[reply]
Our "friend" who avoided all blocks so far is now busy editing Chinese characters entries, wow, adding examples he dug out where proper names are written in English. He'll teach us good Mandarin spelling - so "London University" in Mandarin is "London 大學". Pinyin is not enough, now Chinese will be written half in English, half in Chinese! --Anatoli 11:10, 3 October 2011 (UTC)[reply]
I have protected 英國英国 (Yīngguó) for his "英國 London 大學地質學博士" but there are other bad edits. Can we stop this somehow? --Anatoli 11:15, 3 October 2011 (UTC)[reply]
In the UK, there are a lot of spoken and written usages of mixed scripts, can you ban them? Engirst 11:22, 3 October 2011 (UTC)[reply]
It's called code-switching, very common with communities living outside their homeland. People have already spent too much time repeating the same thing to you but you keep trolling. We already have London and Thames in English, they don't need a Mandarin umbrella. Everybody knows here that you here only because nobody is able to block you completely. It's YOU who needs to be banned indefinitely, troll. --Anatoli 11:31, 3 October 2011 (UTC)[reply]
Above is talking about "英國 London 大學" (UCL) but not Thames河. Engirst 12:08, 3 October 2011 (UTC)[reply]
The vote Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries has started. --Anatoli 11:05, 18 October 2011 (UTC)[reply]

Categories and single entries with multiple indices

This issue has already been touched on above at Wiktionary:Beer_parlour#Question about cats, but I'm realizing this is a sizable issue for Japanese.

The underlying problem is that kanji (Chinese characters used in a Japanese context) only carry the meaning of a word, and can often be read using multiple pronunciations. Kanji entries in Japanese are generally indexed by their readings, so a single kanji compound may appear at several locations in a Japanese dictionary. The example at #Question about cats regarded the given name 恵美, which can be read either Emi or Megumi. Using WT's categories as-is only lists the entry under the final category call on the page, so 恵美 was only categorized under めぐみ (Megumi), when it should have been categorized under both the めぐみ and えみ (Emi) indices.

The solution -sche brought up was to create a redirect under a visually identical header, and categorize the header under the additional index. This does work, happily.

However, there is no dearth of kanji terms in Japanese that have multiple readings. 砂岩 can be read either sagan or shagan; 月食 can be read either gesshōku or gasshōku; 一度 can be read either ichido or hitotabi; 正直 can be read shōjiki, jōjiki, or seichoku; etc., etc. All of these should ideally be categorized under all readings. Manually going through and creating redirects for all entries that currently have uncategorized additional readings is not a tenable proposition.

Is anyone aware of any way of getting the categorization mechanism to allow multiple categorizations? I.e., is there any way of getting something like:

[[Category:Japanese nouns|しょうじき]]
[[Category:Japanese nouns|じょうじき]]
[[Category:Japanese nouns|せいちょく]]

to allow a single entry to show up under all the provided indices? Listing multiple cats as above only categorizes under the reading supplied in the last cat listed. Simply adding additional sorting indices as additional arguments, like [[Category:Japanese nouns|しょうじき|じょうじき|せいちょく]], only indexes under the first argument given. Is there a WikiMedia / MediaWiki dev we should contact about this? -- Curious, Eiríkr Útlendi | Tala við mig 21:51, 28 September 2011 (UTC)[reply]

Translation FROM non-English language

How can I add the Norwegian version of Swedish koka soppa på en spik on that page? (koke suppe på en spiker) __meco 18:14, 29 September 2011 (UTC)[reply]

In my opinion, just like any other translation. Sometimes, translations in foreign word pages are very helpful. This is why this should be allowed. Lmaltier 19:30, 29 September 2011 (UTC)[reply]
If you're asking for an explanation of how to create it, I'd say copy the entire contents of koka soppa på en spik, paste them into koke suppe på en spiker, change every mention of sv to no and Swedish to Norwegian. If there's a policy question in there then I'm afraid I've missed it. Mglovesfun (talk) 19:31, 29 September 2011 (UTC)[reply]

I probably misunderstood. I interpreted Norwegian version as a translation of the phrase. Lmaltier 19:35, 29 September 2011 (UTC)[reply]

Good idea. Also, what if we use Category:English non-idiomatic translation targets? It may be another workaround for non-idiomatic translations, especially for English terms, which may not pass CFI. There are many foreign terms that are translated as single words where English uses two or more (e.g. fur coat) or idioms as above where there is no English equivalent. --Anatoli 00:22, 30 September 2011 (UTC)[reply]
OK, the See also solution is probably the best for this situation where we have two phrases in non-English languages but no English equivalent. Later, when we see more instances of this happening, a more integrated solution will probably have to be devised. __meco 08:26, 30 September 2011 (UTC)[reply]
I don't see making this connection irrelevant for the English Wiktionary, do you really? __meco 20:21, 30 September 2011 (UTC)[reply]
I do. Having said that, I'm sure we have an idiom in English that means the same as the Swedish koka soppa på en spik. --Rockpilot 12:45, 1 October 2011 (UTC)[reply]

TheDaveBot wants to tidy up a bit.

I would like to use my bot account to run a new version of AutoFormat. It is entirely new, so new that it isn't finished yet. There are a couple of "modules" which are ready for testing. Since the task is already pretty well designed and approved of (this new script doesn't do anything AutoFormat wasn't doing) and the account is already a bot I am just posting this to give people an opportunity to make suggestions or complaints or requests for more information or whatever. I have done a few on-wiki tests, none in NS:0 yet, but that will proceed within the next few days I would imagine. Once I am reasonably comfortable with the test results I will go through the normal bot approval routine. The bot wont be going unsupervised before that time.

Now is also a good time for any other format related tasks to be brought up so they can be added, but unless they are non-controversial and conform to the ELE I probably won't add them. - [The]DaveRoss 21:21, 29 September 2011 (UTC)[reply]

Thank you for offering to autoformat!! As for other things to do, well, [[user talk:AutoFormat]] is full of suggestions.​—msh210 (talk) 17:41, 2 October 2011 (UTC)[reply]
Support, I've been meaning to suggest that we need at least two auto-format bots running at a time as the workload is too much even for a bot! Mglovesfun (talk) 17:58, 2 October 2011 (UTC)[reply]
Can the old account be transferred to Dave? It would be nice because everyone still calls it AutoFormat... —CodeCat 14:13, 3 October 2011 (UTC)[reply]
I think it would be better just to continue calling the task AutoFormat but let the account names be whatever they are. Ideally this is something which would eventually migrate to the toolserver and not be run by an individual, but I don't have the skills required to make that a reality. - [The]DaveRoss 21:44, 3 October 2011 (UTC)[reply]
Any valid autoformatting should be supported by all. I've missed it lately and have a bad feeling about the likely backlog. What is the entry-processing capacity of a fully functional AF-type bot, in entries per week? Does having it on the tool-server improve capacity or effectiveness? DCDuring TALK 23:09, 3 October 2011 (UTC)[reply]
Putting it on the toolserver means that it will have a much higher uptime, and be maintainable by multiple people. As far as throughput goes, it is limited by the server more than the program. I think one entry per second would be easily achievable by a single instance of the bot running, and an arbitrary number of instances of the bot can be run simultaneously. The downside is that if there are 20 edits per second happening constantly the server load would be dramatically increased. - [The]DaveRoss 01:36, 4 October 2011 (UTC)[reply]
If you post the code for at least the structure of the bot then some of us could contribute actual code rather than just things for you to do. --Bequw τ 15:59, 7 November 2011 (UTC)[reply]
I can post the code eventually, at the moment I am only able to work on it sporadically as I am pretty busy. If you are interested in helping out that is cool, you really don't need to know anything about the rest of the framework, other than it is written in Java. I have made it to be very modular, so that pieces can be turned on and off or added and removed without impacting anything else. Each "task" is passed a unicode string which represents the whole contents of the page and is expected to return a unicode string which represents the modified version of the page contents. The upshot of this is that anyone can write any task and it should play nicely with the rest. Some of the remaining tasks which I haven't touched yet are etymology, context tags, categorization, synonyms (and other relational sections) and anything related to foreign language terms (conj tables etc.) If you want any more clarification than that let me know. - [The]DaveRoss 01:02, 10 November 2011 (UTC)[reply]

{{etyl|ang|en}} links to Wikipedia; {{proto|{{subst:langrev|Proto-Germanic}}|qwerty}} links to Wiktionary. Why the discrepancy? This, that and the other (talk) 12:20, 30 September 2011 (UTC)[reply]

Dunno really. Isn't not a problem per se. Mglovesfun (talk) 11:40, 1 October 2011 (UTC)[reply]

"A platform of a staircase where the stair turns back in exactly the reverse direction of the lower flight." This is an archaic word; what's the modern term for this? I've always, rather awkwardly, had to describe them as "U-turn staircases". Equinox 20:58, 30 September 2011 (UTC)[reply]

I think they are called half landings, also this is the Beer Parlour not the Tea Room you silly person. - [The]DaveRoss 21:04, 30 September 2011 (UTC)[reply]
Thanks. Also, I just found dogleg, which is a word I have wanted to know for years. (I always dream about that kind of staircase.) Equinox 21:11, 30 September 2011 (UTC)[reply]
Upon further research, it seems like a "half landing" can be any landing which is not at the top or bottom of the stairs. To be precise many people are calling them "180 degree half landings" but that is not a succinct as it should be. - [The]DaveRoss 00:13, 1 October 2011 (UTC)[reply]
  • I always just called it a landing. Ƿidsiþ 06:03, 1 October 2011 (UTC)[reply]
  • In my house there is a square landing at the top of the stairs, then a 90 degree turn and a single step to the landing proper. We call it "the little landing at the top of the stairs" - probably not the technical terminology. SemperBlotto 07:00, 1 October 2011 (UTC