Vowel length[edit]

Please do not edit policy or guideline pages to reflect your personal opinion on this matter without discussing with other editors with experience in Ancient Greek entries first. —Μετάknowledgediscuss/deeds 02:15, 15 November 2013 (UTC)

Μετάknowledgediscuss/deeds 02:29, 15 November 2013 (UTC)

I was the one who wrote many of the original orthography and transliteration standards, though they have undergone some changes in the intervening time. In any case, I'm happy to address some of your issues. However, I think we need to set some things straight first. To begin with, we have had a great many self-proclaimed experts come and go on this project. You must understand that the anonymous context of the internet forces us to treat claims of authority with a grain of salt. Additionally, assertions of what absolutely needs to happen right now simply won't do. Things are done here based on consensus. If you would like things to change, that's completely reasonable. However, you must present your evidence, and win allies with discussion. Personally, I think that vowel length and accent are real components of Ancient Greek phonology, and are something that merits note in our entries; however I think it's important to understand what the purpose of transliterations are here on Wiktionary. Transliterations are never used here as a substitute for the original script, as they are in many other contexts. They are a pedagogic tool, used to help those who don't understand the original script, which they accompany. So, they are an approximation for the uninformed. A highly precise technical transliteration is unnecessary, and serves only to confuse those whom it is meant to help. -Atelaes λάλει ἐμοί 03:04, 15 November 2013 (UTC)
Sorry to barge in like this but this issue with vowel length is one of many issues with Wiktionary which (esp. compared to the English Wikipedia) make it look rather amateurish. (Lack of references is another one.)
I understand your concern about self-proclaimed experts. But go look at my contributions on the English Wikipedia and you will see that I do actually know a bit about the subjects at hand. Ask User:CodeCat, User:Angr and others who contribute to Wikipedia linguistics/language articles about me, if you want.
I'm also guessing that you are not an expert in linguistics, but may have some Classicist knowledge of Ancient Greek. The Classicist viewpoint comes through in various things you say (denigration of transcriptions as an "approximation for the uninformed", insistence on use and importance of the original script, apparent unconcern with not noting vowel length explicitly in all cases). However, Wiktionary is a linguistic work; this goes especially for etymologies. Hence we need to be following linguistic standards, not Classicist standards.
On top of this, your statements about transcriptions are wrong on a number of counts:
  1. In technical linguistics articles esp. on historical linguistics and etymology, it is not reasonable to expect that readers can handle every script out there. Transcription (not "transliteration", which refers to letter-for-letter representation in Latin script, although for Greek the difference isn't too great) is the norm and is the only reasonable way e.g. for even a knowledgeable reader to handle the different languages. Hence, something like the etymology of Old Irish ibid "he drinks" that makes references to Latin bibō and pōtō, Greek pī́nō, Armenian ǝmpǝm, Sanskrit pibati, Old Church Slavonic piti will make everyone go crazy if they are written in four different scripts (Greek, Armenian, Devanagari, Cyrillic) with the expectation that the readers "should" know all these scripts and are "uninformed" (your words) if they don't.
  2. Furthermore, the problem here is that the original Greek script wasn't properly reflecting long vowels, either. This is evidently due to your assertion, made into policy, that vowel length doesn't need to be noted in the Greek script or transcription — a typically Classicist viewpoint, quite reasonable in the context of intrepreting a work of Ancient Greek literature but not appropriate to a linguistic work.
Where should this discussion take place? I'm not asserting, and never asserted, that this change must happen "right now", but it does indeed need to happen at some point, hopefully soon. I am almost positive that all the other linguists working here (I've seen CodeCat and Angr here, there must be others) will agree with me, so I imagine consensus is not too hard to reach on this.
For reference, compare what's done in Latin, Old English, Old High German, etc. where long vowels are always indicated in all uses of every word including in head words, even though the original texts didn't have length marks any more than the original Greek texts did. Greek should follow what every other language does.

Benwing (talk) 09:07, 15 November 2013 (UTC)

If you would like to gain official consensus the Beer parlour is the appropriate place. There are indeed a number of other editors who seem to prefer the more involved transcriptions. I have held them off thus far, but it's quite possible that a determined and eloquent proponent could cause a shift in policy. Until such time, though, I would ask that you refrain from editing existing entries to conform to your view, as I will continue to undo such edits. If you wish to create new content, you are more at liberty to do so as you wish. -Atelaes λάλει ἐμοί 16:43, 15 November 2013 (UTC)

Moving pages[edit]

We generally avoid redirects on Wiktionary, so when you move a page to correct the spelling, could you place {{delete}} on the redirect that's left behind? —CodeCat 10:47, 2 July 2014 (UTC)

Will do. Does it matter where I put it in the redirect page? Presumably after the redirect itself, on the next line?


Rollback link is very close to patrol link so I misclick them sometimes. I use a browser extension which enables me to select a screen region with a mouse and "click" all of the selected links at once upon release. Edits here are high volume and people often make mistakes... Cheers --Ivan Štambuk (talk) 12:16, 2 July 2014 (UTC)

Old French[edit]

I'd be interested to know what your background is. Renard Migrant (talk) 17:58, 12 July 2014 (UTC)

With {{fro-conj-er}}, could you fix it if possible to not need any parameters? Like {{fro-conj-er|dress}}, using Lua can't it deduce that the stem is dress by taking off the final -er? Renard Migrant (talk) 10:30, 24 July 2014 (UTC)
You're right, this is possible. I'll look into it. Benwing (talk) 11:21, 24 July 2014 (UTC)
Please don't delete words that definitely exist. I have no idea why you would do that, I can only assume you haven't read WT:CFI#Attestation. Renard Migrant (talk) 12:19, 24 July 2014 (UTC)
I have undone the deletions. They appear to be Anglo-Norman words, not standard OF words. Standard OF has -gier, not -ger. Benwing (talk) 12:21, 24 July 2014 (UTC)
As I'm sure you know, there's no such thing as standard Old French. Standardization didn't exist yet, by a few hundred years at that. Renard Migrant (talk) 12:22, 24 July 2014 (UTC)
But use of -gier is pretty consistent in Francien works. And there is such a thing as standard spellings in the handbooks. e.g. amer is standard, aimer is not. I actually question whether aimer is a spurious form based on the later language. Yes, you might find occasional places where 'aim-' and 'am-' intrude on each other, but that doesn't (IMO) justify having an entry for aimer. In general, I've been trying to correct a mess of mistakes, e.g. non-standard forms like herberger having entries while standard herbergier doesn't, or std forms being claimed as alternatives to non-standard forms, etc. The current situation has the appearance that someone didn't really know OF very well when creating the entries. Certainly the conjugations were completely and utterly wrong; whoever did them just copied Modern French declensions and hoped they were the same (oops ...). Since you complain, I will not delete these forms but I'll continue to redirect non-standard to standard forms, to try and reduce the chaos of these forms. Benwing (talk) 12:30, 24 July 2014 (UTC)
These are scholarly standard forms. Basically the forms preferred by scholars. We include all words whether included by scholars or not. Less common forms are not mistakes! Do you propose to delete honor because we already have an entry for honour? If your 'corrections' involve deleting truthful information then please stop. If you're just not well enough informed on the subject, also stop. Renard Migrant (talk) 12:34, 24 July 2014 (UTC)
If you want to nominate these forms for deletion, your rationale will have to be "these definitely exist but I don't like them". See what response you get from other people. Renard Migrant (talk) 12:35, 24 July 2014 (UTC)
Look, I already told you I won't be deleting any entries. And my corrections aren't deleting truthful info. You're welcome to look over my changes and critique them if you really want. BTW I'm going to bed now so if you don't hear any more responses from me for awhile it's not because I'm ignoring you or anything but just because I need sleep. Benwing (talk) 12:51, 24 July 2014 (UTC)
Sorry you're right. Renard Migrant (talk) 13:24, 24 July 2014 (UTC)
We class Anglo-Norman as a dialect of Old French. See Template talk:xno. — Ungoliant (falai) 19:08, 24 July 2014 (UTC)

Category:Pages with module errors[edit]

Seems to be Module:fro-verb's fault. Keφr 10:21, 28 July 2014 (UTC)

Thanks, I fixed it. Benwing (talk) 19:53, 28 July 2014 (UTC)

Template:Old French preterite type boiler, Template:Old French verb ending boiler[edit]

These are named contrary to our template naming customs. Also, User:CodeCat has been developing a category boilerplate infrastructure recently, into which it might be desirable to integrate these two. Keφr 08:05, 12 August 2014 (UTC)

How are they supposed to be named? I couldn't figure that out from the link you posted. I named them based on Template:Spanish conjugation boiler. Is that also misnamed? Benwing (talk) 08:39, 12 August 2014 (UTC)
I guess {{fro-preterite catboiler}} or something similar. And I never saw these templates, but I suppose yes; though these naming conventions are not actually strict policies. They are just a codification of some coding practices, some of which are relatively recent. Keφr 09:43, 12 August 2014 (UTC)
How about {{fro-preterite type catboiler}} and {{fro-verb ending catboiler}}? Benwing (talk) 05:48, 13 August 2014 (UTC)
Fine by me. Keφr 09:42, 13 August 2014 (UTC)
OK, they've been changed. Benwing (talk) 10:55, 13 August 2014 (UTC)


I created حَقَّقَ (ḥaqqaqa) because of your edits on حق. By the way, how much do you know about the Arabic language? --Lo Ximiendo (talk) 00:00, 15 August 2014 (UTC)

I studied Arabic for a couple of years and I know the verb conjugations reasonably well. I wonder why they aren't automated? Seems like a perfect opportunity since the conjugations are so systematic. I've written code in other circumstances to generate Arabic verb conjugations and it isn't all that hard. Benwing (talk) 03:17, 15 August 2014 (UTC)
Hi and welcome. You're more than welcome to take over the work on Module:ar-verb (there are many existing working templates too, which cover various conjugations). Even if the module doesn't provide transliterations, it would be great to have it. Please don't underestimate the amount of work required for this module to cover all types of conjugations. Pls add a Babel to your user page, so that people know which languages you speak. --Anatoli T. (обсудить/вклад) 23:18, 24 August 2014 (UTC)
Hi Anatoli. You might have noticed I've done a bunch of changes to Module:ar-verb, generalizing the code (e.g. you can specify an arbitrary number of verbal nouns), finishing form I geminate (including the alternative jussive forms), and adding form II and III strong. It should be easier to expand from now on and it does provide transliterations, using Module:ar-translit. You're right that it's a lot of work to get all the conjugations. Potentially especially problematic are the hamzated ones. I think the best thing here is to write a module that substitutes the correct hamza seat based on the surrounding vowels. This is definitely possible, and there are detailed rules (which I wrote) on the Wikipedia page on "hamza". I'll look into adding Babel stuff; not quite sure how to do it but I'll look at some existing user pages. I already have this info on my Wikipedia user page. Benwing (talk) 01:14, 25 August 2014 (UTC)
Oh, I didn't notice that you edited the module page. At some stage I just lost motivation. I've got Arabic grammar books though, so I can help with testing the module for specific conjugation types and might add some types, once all the infrastructure is there and we have some working examples. I won't be able to fix any issues with the wrong display for diacritics. I hope User:ZxxZxxZ can also help. Good luck! --Anatoli T. (обсудить/вклад) 01:23, 25 August 2014 (UTC)
Thanks. I'm not sure what the issue is with the diacritics; I notice a comment about shadda + fatha getting displayed wrong, but I don't see this, regardless of whether I put the diacritics in shadda-fatha order or in the fatha-shadda order that you stuck in using dia.sh_a. Possibly this bug has been fixed in the software? Benwing (talk) 01:29, 25 August 2014 (UTC)
BTW there's also a detailed Wikipedia page on w:Arabic verbs which I wrote awhile ago; it lists all the conjugations with all the weaknesses. It's largely in transliterated form so the hamza issue doesn't come up and isn't treated as a weakness. Benwing (talk) 01:32, 25 August 2014 (UTC)
The diacritics bugs are not consistent and they are visible when testing with different OS and browsers. I think it's best to use the correct logical order and address the issues when they happen. Your WP page looks very good. The focus should be on the Arabic script, though, so hamzated verbs should take into account spelling changes. --Anatoli T. (обсудить/вклад) 01:47, 25 August 2014 (UTC)
Thanks. Agreed on targeting the Arabic script. If the diacritic bugs are still there and simply requiring reversing the order of shadda-fatha and such, then the correct way to deal with them is to postprocess the output, applying the reversals as necessary. Do you see the errors on your machine? (If so, what is your OS and browser? I'm using Chrome on Mac OS X, and no problems for me.) Take a look at User:Atitarev/ar-conjug-I-geminate-test and tell me if you see the errors in any of the numerous forms with shadda-fatha (e.g. 'dalla' or 'dallā') or combinations with other short vowels. Benwing (talk) 02:29, 25 August 2014 (UTC)
I currently see User:Atitarev/ar-conjug-I-geminate-test correctly on Windows 7, Firefox 31. --Anatoli T. (обсудить/вклад) 02:36, 25 August 2014 (UTC)

About moving Arabic verbs[edit]

I moved two verbs and a noun to أصل from اصل. Would you like to create entries for the two verbs that are listed on the latter? --Lo Ximiendo (talk) 22:35, 31 August 2014 (UTC)

Done. Benwing (talk) 22:40, 31 August 2014 (UTC)
I actually mean the verbs تأصل and استأصل. --Lo Ximiendo (talk) 23:49, 31 August 2014 (UTC)
I'm confused. If you can move those two verbs to where they belong, I can add the conjugations. Benwing (talk) 01:21, 1 September 2014 (UTC)
I added the verbs already. *gulp* --Lo Ximiendo (talk) 11:02, 1 September 2014 (UTC)
Thank you very much! I went ahead and added the conj. Benwing (talk) 11:26, 1 September 2014 (UTC)


I wasn't too sure about the imperfect, especially in the automated conjugation table that was given to the entry. Have you noticed that? I did. --Lo Ximiendo (talk) 10:21, 1 September 2014 (UTC)

Noticed what? I just checked my verb tables and it looks correct. I have tables for the verb أقام and the ones I generate for that verb look correct, and أجاب should follow exactly the same conjugation. Is there anything in particular that seems wrong to you?
BTW which automated tool are you using to do the edits such as you did on أجاب? I only know of AWB but usually it announces itself in edit entries. Benwing (talk) 11:14, 1 September 2014 (UTC)
The ar-conj template gives out yujību instead of ar-verb's yajību. That's what I noticed. --Lo Ximiendo (talk) 11:24, 1 September 2014 (UTC)
ar-verb is wrong, ar-conj is correct. Forms II, III, IV and Iq take prefixes with -u- in the active imperfect, whereas all the others take -a-. There may be lots of other errors in ar-verb but I'm pretty confident in the correctness of ar-conj. Benwing (talk) 11:30, 1 September 2014 (UTC)
Maybe it's just the editor's fault that they used yajību instead of yujību? --Lo Ximiendo (talk) 11:33, 1 September 2014 (UTC)
Probably ... I'm thinking actually that ar-verb needs to be automated like ar-conj so you don't have to type in any more info than what you type into ar-conj (except to clarify the radicals in a few cases), and it automatically figures out the radicals from the headword and generates the 3rd-person masculine singular past and non-past indicative. I added a comment to your talk page about this. Benwing (talk) 11:38, 1 September 2014 (UTC)


I also created تأمل to move it from أمل. Cheers. --Lo Ximiendo (talk) 10:08, 2 September 2014 (UTC)

ar-verb forms for ط و ع[edit]

I think there should be a way to modify {{ar-verb forms}} so that it accommodates Arabic roots such as ط و ع. --Lo Ximiendo (talk) 10:30, 2 September 2014 (UTC)

I don't have a very good understanding of all those templates. Can you explain how {{ar-verb forms}} is used? Do you call it directly or is it call from another template? Where is it used (in the headword line, etc.)?
However, all the code to handle all types of Arabic roots is already in Module:ar-verb. In the process of generating conjugation tables it generates all the forms that {{ar-verb forms}} generates and it handles all the types of roots and in general does all sorts of things way better than any of the current templates. Notice for example that in a non-form-I verb, all I have to do is write e.g. {{ar-conj|III}} and it automatically infers the appropriate radicals and generates all the forms, with all the vowels and also automatically transliterated. There's no reason that {{ar-verb}} couldn't take similar parameters and automatically generate the vocalized head word, the vocalized 3rd-person masculine singular imperfect indicative to display in the headword line, plus automatic transliteration, etc. Benwing (talk) 11:03, 2 September 2014 (UTC)
You could have a look at ج ه د for an example of {{ar-verb forms}} at work. --Lo Ximiendo (talk) 11:56, 2 September 2014 (UTC)
I moved the red link verbs to their new homes, along with those that were already created. They also request definitions (maybe not the form I and II verbs?). --Lo Ximiendo (talk) 10:43, 3 September 2014 (UTC)

Beer parlour[edit]

These discussions you are starting at the BP about Arabic templates, don't really belong there. The BP is sort of like the Supreme Court in that discussions there should affect all of Wiktionary. --WikiTiki89 14:18, 3 September 2014 (UTC)

@Atitarev: Actually you were the one who started the latest one. --WikiTiki89 14:20, 3 September 2014 (UTC)

Two Arabic verb categories[edit]

I created Category:Arabic form-? verbs‏‎ and re-created Category:Arabic geminate form-II verbs‏‎ because they have members, but I can easily delete them if you think we shouldn't have them. In the latter case, you should make the corrections necessary so the entries don't get categorized in them. Thanks! Chuck Entz (talk) 15:47, 5 September 2014 (UTC)

Yeah, these categories should be there, thanks. The first one indicates a mistake in the entry (missing form= param) but it's still useful. I have no idea why I deleted the second one. Benwing (talk) 18:01, 5 September 2014 (UTC)

Entries created from the list of Arabic Quranic Verbs[edit]

Hi, in case you haven't noticed, I created the verb مَكَثَ (makaṯa) from the second half (501-1000) of the aforementioned list. --Lo Ximiendo (talk) 13:17, 14 September 2014 (UTC)

Also created نَفِدَ (nafida) some time ago. --Lo Ximiendo (talk) 02:27, 15 September 2014 (UTC)

Arabic collective nouns and their category[edit]

I wish {{ar-coll-noun}} gets its Category:Arabic collective nouns sorting back. Any thoughts about that? --Lo Ximiendo (talk) 13:04, 16 September 2014 (UTC)

Fixed. Benwing (talk) 13:13, 16 September 2014 (UTC)
Thank you. :) So it was just a simple bug... --Lo Ximiendo (talk) 13:22, 16 September 2014 (UTC)
I don't know how the templates {{ar-coll-noun}} and {{ar-sing-noun}} now sort Arabic nouns into a single red link category now instead of Category:Arabic collective nouns and Category:Arabic singulative nouns. --Lo Ximiendo (talk) 04:01, 19 September 2014 (UTC)
Oops. That is now fixed. Benwing (talk) 04:09, 19 September 2014 (UTC)
Thank you again. Besides, I'm going on a vacation to Topsail Island and be back in about a week (I think). --Lo Ximiendo (talk) 04:34, 19 September 2014 (UTC)
Have fun!!! Benwing (talk) 04:36, 19 September 2014 (UTC)

Category:Old French verbs with partial overrides[edit]

Were you intending to use this category for anything? —CodeCat 20:38, 25 September 2014 (UTC)

I went ahead and created it. It's intended to signal a particular practice that should be avoided as much as possible. Benwing (talk) 21:13, 25 September 2014 (UTC)

Arabic head parameters[edit]

If I understand it correctly, all Arabic headword lines should eventually have this parameter? If so, then it may be more efficient to make it the first positional parameter. We've already done this for Russian, Ukrainian and Slovene, which need accent marks for most words. What do you think of this? —CodeCat 20:58, 5 October 2014 (UTC)

Yes, all Arabic words should have it. However, there's a complication in that sometimes there are multiple possible vocalizations, which are currently implemented using head2=, head3=, head4=, etc. If we make head= the first positional parameter, what do we do about the remainder? One possibility is to allow multiple heads to be specified in a single head= parameter, separated by e.g. commas (this means in the unlikely case where a comma appears in a headword, it needs to be HTML-escaped, but that seems no big deal). It also shortens the typing effort. I suppose we could also have the first positional param be the head, and other ones still use head2=, head3=, etc.
Also keep in mind the effort required to fix all the various Arabic headword templates and usages of those templates if you make this change. Benwing (talk) 21:07, 5 October 2014 (UTC)
Yes I was thinking the only change would be head= to 1=, but the other headword parameters wouldn't change. This kind of "paradigm" is relatively common in Wiktionary templates. I am considering making Module:ar-headword for this. —CodeCat 21:12, 5 October 2014 (UTC)
If you're willing to fix everything up yourself, go ahead. Keep in mind there are many templates in Category:Arabic headword-line templates that make use of the param head= in various ways, and would all need to be fixed. Benwing (talk) 21:15, 5 October 2014 (UTC)
Yes, I'm aware of that. But it's fairly easy to rename and move around parameters with a bot, combined with tracking categories. —CodeCat 21:18, 5 October 2014 (UTC)
OK. Benwing (talk) 21:19, 5 October 2014 (UTC)
I've made the change to all Arabic headword-line parameters, except (for now) {{ar-nisba}}, {{ar-verb}} and {{ar-verb-part}}. It turned out that none of the templates used the 1= parameter for anything yet, so I didn't need to shift anything around. This means that for now, both head= and 1= work. But of course the former is deprecated now. Could you update the documentation of the templates? —CodeCat 23:34, 5 October 2014 (UTC)
Done. Benwing (talk) 00:08, 6 October 2014 (UTC)
Thank you. We could change more of the parameters to positional too. g= is probably a candidate, and maybe other {{ar-noun}} parameters too. —CodeCat 00:11, 6 October 2014 (UTC)
I'm wary of too much of this. At least, there should be some logic to parameters that are positional so it's not just a random collection in a hard-to-remember order (or to remember which are positional and which aren't). Benwing (talk) 00:15, 6 October 2014 (UTC)
A lot of templates already have the gender as the first positional parameter, and I noted above that for some, the headword is the first; the gender is the second then. So this is not so hard to remember. —CodeCat 00:20, 6 October 2014 (UTC)
OK, if you're gonna write the bot code to fix up the calls, go ahead. Benwing (talk) 00:24, 6 October 2014 (UTC)
Just to add... On Wiktionary, a somewhat general practice in writing templates is that the most frequently used and non-optional parameters are positional, while more rarely used or optional ones are named. In principle, every call to {{ar-noun}} should have a gender specified, so it's a good candidate for making it positional. That's actually the same reason I offered to make the headword parameter positional too. —CodeCat 00:30, 6 October 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── As it happens, in the case of gender, it could be made optional. The large majority of nouns have their gender in accordance with their ending, and we could potentially list only the exceptions. This is what Arabic dictionaries typically do, for example. Benwing (talk) 00:37, 6 October 2014 (UTC)

I think it's a good idea to add gender, anyway, even if it's largely predictable and can be loaded automatically. That way, Wiktionary will be better than other dictionaries, which don't show genders. I sometimes have doubts about nouns ending in ه‎ (which may be silent or stand for ة‎), ا or ء. The noun gender for humans are often determined semantically, not by endings and it's somewhat confusing for place and country names. --Anatoli T. (обсудить/вклад) 00:45, 6 October 2014 (UTC)
We could automatically determine gender for nouns in quite a few languages. But in practice we don't do this because gender tends to be somewhat unpredictable even then. Every language has exceptions. —CodeCat 00:47, 6 October 2014 (UTC)

Arabic adjective genders[edit]

I noticed that {{ar-adj}} takes a gender parameter, but I'm not sure why. I imagine that Arabic adjectives, like those in Indo-European languages, take the gender of the thing they refer to. I examined the entries that provide this parameter, and apparently the vast majority specify g=m but a few have g=f. I don't know what the practices are regarding which form is considered the lemma, but if I assume right that it's the masculine singular form, then the entries which specify feminine gender should probably be looked at and converted into a {{feminine of}} type entry. Could you have a look at these? They're at Special:WhatLinksHere/Template:tracking/ar-head/adj g/f.

Regarding the remainder, would it be correct to eliminate the g= parameter altogether for {{ar-adj}}, and assume that all entries that use this template are masculine singular adjectives? —CodeCat 19:46, 6 October 2014 (UTC)

Yes, you are right that masculine singular forms are the lemma, and gender in Arabic adjectives does work essentially like Indo-European languages. I think it's correct to eliminate the gender code from them. Those forms marked as feminine are non-lemma feminine singular forms and should be converted as you specify. Benwing (talk) 01:05, 7 October 2014 (UTC)
But I don't know the corresponding masculine forms, so I don't know how to fix them. —CodeCat 01:07, 7 October 2014 (UTC)
@CodeCat: All terms, except أنثى and عليا are term minus the final ة, which acts as a feminine marker, e.g. أوروبية is a feminine form of أوروبي. Yes, they should use {{feminine of}}. --Anatoli T. (обсудить/вклад)
@CodeCat: All the terms should be fixed up. أنثى appears to possibly be feminine tantum and so I listed it just as an adjective (no gender); the others are listed as "adjective form"s with g=f and use of {{feminine of}}. Benwing (talk) 09:28, 8 October 2014 (UTC)
I've removed the genders from adjectives now. But can you check something? The template {{ar-adj-color}} had feminine and plural forms, but which plural form is this? Is it the masculine plural or the common plural? —CodeCat 19:25, 8 October 2014 (UTC)

Entries where the xxhead= parameter is not the xx= parameter + vowels[edit]

I'm working on removing redundant the xxhead= parameters now. But there are a few entries, listed at Special:WhatLinksHere/Template:tracking/ar-head/xhead/needed, where, if vowels are removed from the xxhead= parameter, the result is not identical to the xx= parameter. I don't know much at all about Arabic and my knowledge of the writing is very basic, so I'm not able to fix these. Could you have a look? —CodeCat 21:04, 6 October 2014 (UTC)

Benwing, are we making headword with or without ʾiʿrāb? Also, do we need to add fatḥa before alif - إِمْتِحَان or إِمْتِحان? it seems to work without. I can fix some. --Anatoli T. (обсудить/вклад) 23:49, 6 October 2014 (UTC)
There are cases with irregular transliterations خَطَر xaṭar (x should be replaced with ḵ, ħ with ḥ) or missing vowels صَوت (missing sukūn), should be صَوْت, etc. --Anatoli T. (обсудить/вклад) 23:54, 6 October 2014 (UTC)
(Edit conflict) I have fixed most of them, not sure why سريع is still appearing there. I don't know the vowels for the second word in شبح ظل. Is it a SoP? --Anatoli T. (обсудить/вклад) 00:37, 7 October 2014 (UTC)
We don't currently have any consensus about whether to put ʾiʿrāb in headwords. What do you think? There seems to be a sort-of convention to include ʾiʿrāb in noun headwords but not in the transliterations, but that will require some special-case hacking to distinguish verbs from nouns, since we do want the ʾiʿrāb in verbs. Convention in most dictionaries seems to be to omit the ʾiʿrāb in nouns, but then diptotes need to be marked in some special fashion. For example, the Hans Wehr dictionary puts a subscript 2 by diptotes. Adding ʾiʿrāb is one way of indicating this. I guess probably we should include ʾiʿrāb and if necessary go ahead and include it in the transliterations as well, but I'm not sure.
There are unfortunately various systems being used for transliteration. x in place of ḵ and ħ instead of ẖ are some of the most common substitutions. If you look in Module:ar-translit you'll see I handle lots of transliteration conventions in the code that generates vowels from transliteration. This should be fixed with a bot.
Missing vowels should be added. This can be done from the transliteration usually. As for fatḥa before alif, there is special casing in the transliteration code to handle this case and a few other cases where there's no ambiguity when the vowels are omitted, but they should be there still. Benwing (talk) 00:29, 7 October 2014 (UTC)
CodeCat (talkcontribs), I fixed the two cases I saw in Special:WhatLinksHere/Template:tracking/ar-head/xhead/needed. Benwing (talk) 00:33, 7 October 2014 (UTC)
I think it's OK to put ʾiʿrāb in headwords AND transliterate it. It was already agreed on, I think. Not sure why inflected forms are not transliterated. I don't like tāʿ marbūta transliterated as "-a(t)". Not sure it was discussed and agreed on. I think it's better to use "-a" or "-atun" if ʾiʿrāb is given. The page about Arabic can teach the actual (pausal, informal) pronunciations. --Anatoli T. (обсудить/вклад) 00:37, 7 October 2014 (UTC)
I'm ok with ʾiʿrāb in headwords and transliterations. Inflected forms aren't transliterated because of changes that CodeCat made; I've asked her to undo these changes or incorporate them into {{head}}. The transliteration of tāʾ marbūṭa as "-a(t)" should occur only when it appears as the first word in a multi-word expression. When appearing at the end of text, it should appear as "-a", and as "-atun" with ʾiʿrāb vowels. Benwing (talk) 00:45, 7 October 2014 (UTC)
Thanks. In ʾiḍāfa, the genitive construct, it should be "-at", not "-a(t)" and "-āt", if it follows an alif. We discussed this as well. --Anatoli T. (обсудить/вклад) 00:50, 7 October 2014 (UTC)
I think the best compromise about ʾiʿrāb is to de-emphasize them, either by graying them out—سَنَةٌ (sanatun)—or by superscripting them—سَنَةٌ (sanatun). The superscript currently looks ugly and I am not sure why. The remaining question is what to do with the fatḥatān-ʾalif ending, which when omitted leaves behind a long -ā. I think the best solution is to simply transliterate all fatḥatān occurrences normally—مَعًا (maʿan). --WikiTiki89 00:51, 7 October 2014 (UTC)
So, you oppose ʾiʿrāb transliteration? It's easier to arrive at pausal form than the other way around. ʾiʿrāb can be simply omitted in pronunciation but users will know the full form and know, which one is a diptote or triptote. Also, I just thought that it won't be possible to determine programmatically ʾiḍāfa or a noun + adjective. A flag could be used for that, I think. --Anatoli T. (обсудить/вклад) 00:55, 7 October 2014 (UTC)
Graying them out is OK with me, if you don't like the normal way. --Anatoli T. (обсудить/вклад) 00:57, 7 October 2014 (UTC)
The only thing about graying them or superscripting them is that it's a bit tricky to do this without adding manual tr= params everywhere, which defeats the purpose of automatic transliteration -- at least that's the case if we want to cite verbs with ʾiʿrāb. There are ways around this but they might not work consistently. (Alternatively, we could always gray, even for verbs.) As for fatḥatān-ʾalif, they should definitely appear normally as -an. Benwing (talk) 01:01, 7 October 2014 (UTC)
Why is it tricky to do it without manual tr= params? The module can return html tags as part of the transliteration. --WikiTiki89 03:59, 9 October 2014 (UTC)
The problem isn't with returning html from the module. What's tricky is if we want to gray out or omit ʾiʿrāb in nouns but not verbs, because the transliteration module doesn't know what's a noun and what's a verb. Verbs are traditionally cited in forms with full ʾiʿrāb -- certainly the dictionary form is. If you think we should gray out all ʾiʿrāb, including in dictionary-form verbs, then doing it automatically is not an issue. Benwing (talk) 04:26, 9 October 2014 (UTC)
If we are graying out ʾiʿrāb, there is no reason not to also gray it out for verbs. The ʾiʿrāb on verbs is omitted in the same contexts as nouns (i.e. in pausal position or in colloquial speech). --WikiTiki89 04:32, 9 October 2014 (UTC)
Anatoli -- The problem with transliterating -at in the genitive construct is that it's not programmatically obvious when such a construct occurs and when it doesn't, e.g. غُرْفّة البَيْت vs. الغُرْفَة الكَبِيرَة. Benwing (talk) 01:01, 7 October 2014 (UTC)
Ah, I see you noticed this too. Benwing (talk) 01:08, 7 October 2014 (UTC)
Yes, in that case, just "-a(t)" is fine, since it's not possible to determine if they are ʾiḍāfa or a noun + adjective. --Anatoli T. (обсудить/вклад) 05:24, 8 October 2014 (UTC)
In fully vowelated text, ʾiḍāfa is easy to identify as it lacks both nunation and the definite article. --WikiTiki89 03:59, 9 October 2014 (UTC)
Yes, you're right. That means we need to provide full vowels. For terms without ʾiʿrāb it's OK to leave "-a(t)" if greying out is not used. --Anatoli T. (обсудить/вклад) 04:13, 9 October 2014 (UTC)
I agree. Graying out is only relevant with ʾiʿrāb anyway. --WikiTiki89 04:17, 9 October 2014 (UTC)
The display of "-a(t)" already only occurs without ʾiʿrāb; it only occurs when ة is at the very end of the word followed by a space, or when ʾiʿrāb display is turned off and a space follows. If ʾiʿrāb vowels are supplied, ة is always displayed as a "t". However, your suggestion is useful when graying out to determine whether to gray out the "t". Benwing (talk) 04:26, 9 October 2014 (UTC)
All the old xxhead= parameters have now been removed, with their old values transferred over to the regular parameter name. —CodeCat 19:18, 8 October 2014 (UTC)

Arabic genders of numerals, collective and singulative nouns[edit]

I've now converted all uses of the g= parameter to the second positional parameter. But I came across a few things that I wonder if you could clarify.

  • All singulative nouns I came across were feminine. If this is a rule, then I suppose it could be made the default. Are there any exceptions?
  • Most collective nouns were masculine, except for ذرة and بوم. Again is this something that could be made default, but there are apparently exceptions. Unless those are errors, but I don't know that.
  • Currently, numerals also have a gender parameter. Do numerals have inherent gender like nouns, or do they adapt their gender like adjectives? Most of them were masculine, but some were both masculine and feminine.

CodeCat 22:05, 8 October 2014 (UTC)

As far as I know, singulative nouns are always feminine. They are formed from collective nouns by adding the feminine ending -ة. Collective nouns are generally always masculine, and are distinguished by having a singular form but plural meaning. (The corresponding singulative noun has a singular meaning.) The Wehr dictionary doesn't indicate ذرة as collective, so I'm not sure why it's marked as such, and it does indicate بوم as collective, but not as feminine. So it's possible those are both errors.
Numerals in Arabic are complicated. It's rather like Russian, where the cardinal numbers become progressively more noun-like and less adjective-like as they get higher. I went through them all recently and marked gender, which I think is correct, but it's questionable because some forms are in between nouns and adjectives. "One" and "two" are pure adjectives; "three" through "ten" behave like nouns in that the corresponding noun (e.g. in "three men") is in the genitive plural, but they also agree in gender with the governing noun. 11 through 19 are similar but govern the accusative singular. 20 through 90 again govern the accusative singular but don't agree with the governing noun, or alternatively their form is invariable in gender, which is why I marked them as both masculine and feminine. 100 and 1000 are clearly pure nouns and govern the genitive singular; 100 is feminine and 1000 is masculine, which can be seen by the agreement of smaller numbers in forms like 300 and 3000, where the word for 3 is feminine in 300 but masculine in 3000. The whole system is a huge mess. Benwing (talk) 23:02, 8 October 2014 (UTC)
Maybe we should indicate numerals using the part of speech they actually belong to then, rather than "numeral". After all, if they really are nouns or adjectives, then we should mark them as such. Concerning collectives, I wonder if the template could have no gender parameter at all, and always assume that they are masculine with no way to override. That assumption is only valid if there are no exceptions of course. For singulatives, I've already done this. —CodeCat 23:06, 8 October 2014 (UTC)
The issue with this is the forms that are partly noun-like and partly adjective-like, like 3 through 10 ... which do you declare them as? Benwing (talk) 23:09, 8 October 2014 (UTC)
I don't really know. Numerals are always a bit strange that way in many languages, that's why we use the "numeral" part of speech. It's kind of a catch-all for all the weirdity that goes on with such words in various languages. Of course that doesn't mean every single cardinal number term in a language has to be called "numeral". For example, miljoen (million) is marked as a noun, while duizend (thousand) and honderd (hundred) are both noun and numeral, and tien (ten) is a numeral only. So I'd suggest using adjective or noun for those where those terms clearly fit, and use numeral for the remainder?
And what about collectives? —CodeCat 23:15, 8 October 2014 (UTC)
What happens in Russian re. numerals? That's probably the closest to Arabic. As for collectives, بوم is apparently masculine in reality. No indication that it's feminine in any of the three dicts I looked in. ذرة is claimed to be simultaneously collective and singulative in Lane's comprehensive and verbose dictionary. I don't know what to do about that. I guess make the gender default to masculine but let it be overridden. Benwing (talk) 23:33, 8 October 2014 (UTC)
OK, Russian has all numerals as just "numeral" or "cardinal number". 1 and 2 are given with masculine and feminine forms; 100, 1000, etc. are tagged with their inherent gender, and the in-between ones, which are gender-invariable like Arabic 20 through 90, are marked without gender. I think this is probably the right solution for Arabic as well. Most languages appear to be consistent in using "numeral" etc. for all numbers; Dutch is the odd case out apparently. The Russian entries are also very well documented, including extensive usage notes on all the complications, so I think they're a good model to follow. Benwing (talk) 23:45, 8 October 2014 (UTC)
Thanks :) Arabic and Russian complexity of numerals are often used in debates and comparisons. They are a bit similar in usage, only feminine and masculine are confusing reversed in usage where feminine خمسة is used with mascline nouns and masculine خمس with feminine nouns. Russian numerals usually use genitive (singular or plural depending on the number). Number "one" is identical in usage, only Russian has also neuter. --Anatoli T. (обсудить/вклад) 23:50, 8 October 2014 (UTC)
I think more care should be taken regarding the part of speech. Dutch being the odd one out is not a good thing for the other languages I would say. German closely parallels Dutch for example so the entries should be similar. Dutch numbers don't inflect for gender or number, but the noun-ness is apparent from other syntactical structures. 100 and 1000 have plurals, for example. And "million" must be preceded by an article like any other counting noun (such as liter, dozijn (dozen), stapel (pile)). The entries themselves are a bit sparse, but w:Dutch grammar#Numerals goes into some detail. I've also tried to be exact for Proto-Slavic entries, so accordingly 1-4 are adjectives with full three-gender paradigms, 5-10 are feminine nouns with paradigms for only that gender. —CodeCat 23:55, 8 October 2014 (UTC)
See Cherine's second post here [1] Example:عشر نساء "ten women" (masculine numeral with feminine noun in plural), ستة أيام "six days" - feminine numeral with masculine noun in plural. --Anatoli T. (обсудить/вклад) 23:57, 8 October 2014 (UTC)
I just think it's a bit clunky to try to assign noun or adjective to numerals that don't behave quite as either. To assign "numeral" to 3-90 whereas "adjective" to 1 and 2 and "noun" to 100 up seems really ugly. All are numerals; they also behave similar to adjectives and/or nouns, but with enough special cases that this should probably be treated as usage info. For example, the word مئة "hundred" behaves mostly as a noun, but irregularly has a plural that's the same as its singular, which no other feminine noun does. Benwing (talk) 00:04, 9 October 2014 (UTC)
Yes, treat them as numerals regardless of behaviour. We'll need to explain why خَمْسَة (ḵamsa) (feminine-looking numeral) is a masculine and خَمْس (ḵams) (masculine-looking numeral) is a feminine. Usage notes, appendix, something else? --Anatoli T. (обсудить/вклад) 05:18, 9 October 2014 (UTC)

Plural of inanimate nouns[edit]


Also @CodeCat: I've edited رِيَاح شَمْسِيَّة (riyāḥ šamsiyya) and رِيَاح نَجْمِيَّة (riyāḥ najmiyya). What I don't like is the gender "m-pl". Inanimate objects and animals in plural are grammatically feminine, aren't they (which is reflected in the adjectives used)? And there's no distinction between masc. and fem. plural for objects. "m-pl" and "f-pl" should probably only be used for humans, IMO. Did I miss anything? I can't use simply "p" for plural. --Anatoli T. (обсудить/вклад) 22:56, 8 October 2014 (UTC)

I've edited Module:ar-headword so that it recognises "p" as the plural gender, rather than "m-p" or "f-p". —CodeCat 23:02, 8 October 2014 (UTC)
Yes, plural inanimate objects take feminine singular agreement in Arabic, regardless of what their singular gender is. Plural adjectives are used only for people. I'm not sure about animals, might depend on whether they are higher or lower animals, who knows? Probably just "plural" is correct as the gender. Benwing (talk) 23:06, 8 October 2014 (UTC)
OK, I noticed you deleted m-p and f-p as possibilities. They still apply to animate nouns, so should remain as possibilities. Benwing (talk) 23:48, 8 October 2014 (UTC)
@CodeCat: Yes, please. Inanimate plural nouns are grammatically feminine singular (referred to as "she" - "هي" and use feminine adj. endings, have "broken" plural forms for nouns) but not humans or some animals, which use "they" pronoun (there is a masculine and feminine "they" - "هم" "m" and "هن" "f") and use plural noun and adjective endings (broken and sound). --Anatoli T. (обсудить/вклад) 03:52, 9 October 2014 (UTC)
Why can't we simply consider non-human plurals to be grammatically feminine singular (f), rather than "plural" (p)? --WikiTiki89 04:22, 9 October 2014 (UTC)
When plural nouns occur as dictionary entries, I think there should be some indication that these are plural rather than feminine singular. Perhaps they should be identified as plural inanimate. Benwing (talk) 04:30, 9 October 2014 (UTC)
But what makes them plural other than their meaning? The meaning is indicated in the definition. Also, we should not use the word inanimate since this applies to animal plurals as well. --WikiTiki89 04:34, 9 October 2014 (UTC)
The examples that Anatoli gave above were رِيَاح شَمْسِيَّة (riyāḥ šamsiyya) and رِيَاح نَجْمِيَّة (riyāḥ najmiyya), translated as "solar wind" and "stellar wind" even though the word "wind" in Arabic is plural. So the definition doesn't always indicate the plurality. The plurality is indicated in the fact that the word for wind is a broken plural. This explains why, e.g., a word that doesn't have a feminine ending has feminine agreement, and it also tells you that you can't pluralize these forms because they're already plural (contrary to English where terms "solar winds" and "stellar winds" exist and have the expected plural meaning). If you object to "inanimate" we could say "non-human" abbreviated "nonhum" or "non-hum" or something. Benwing (talk) 04:44, 9 October 2014 (UTC)
So then other than their etymologies, what makes these examples "plural"? Take for example English crossroads, which is grammatically singular. Other than its etymology, there is nothing "plural" about it. The only thing in mind that makes رِيَاحٌ (riyāḥun) plural is that it has a singular رِيحٌ (rīḥun). If رِيَاح شَمْسِيَّة (riyāḥ šamsiyya) does not have a singular, there is no basis left for me to call it a plural. (Now if we were discussing colloquial Arabic, these would all be grammatically plural and there would be no further confusion.) --WikiTiki89 04:56, 9 October 2014 (UTC)
I agree with Benwing's argument that it should be marked as plural, even if it's a plurale tantum, which doesn't have a singular by definition. Some consider them feminine singular but I think it's better to treat broken plurals as plurals. A note in "About Arabic" on non-human plurals would suffice, I think. --Anatoli T. (обсудить/вклад) 05:07, 9 October 2014 (UTC)
But what is it that makes it plural? That's what I really want to know. Saying "it is a broken plural therefore it is a plural" is just a circular argument (and a "broken plural" is really just a singular noun that is used in place of the plural). --WikiTiki89 05:32, 9 October 2014 (UTC)
Convention or agreement between dictionary creators, if you wish. What do YOU wish to make them? Feminine singular? It's just another option. What about the fact that you can't make it plural, anymore, the etymology (plural for "wind") or that ALL non-human plurals behave like that, e.g. بُيُوت (buyūt)? It's not feminine sg but plural, isn't it? رِيَاح شَمْسِيَّة (riyāḥ šamsiyya) just doesn't have singular, if we consider it a plurale tantum. --Anatoli T. (обсудить/вклад) 05:51, 9 October 2014 (UTC)
Well here's the problem (yes, it's theoretical, but this whole discussion is pretty theoretical): Suppose we have a word whose etymology is unknown or ambiguous, it is used with feminine-singular agreement, it itself has no plural and no singular, and it does not exist in the colloquial language. What criteria do we use to determine whether it is a feminine singular noun or a non-human broken plural? --WikiTiki89 06:02, 9 October 2014 (UTC)
Something to be added here is that broken plurals often have a form that tells you they're broken plurals, e.g. أَرْوَاح "souls" (plural of روح) is of a traditionally plural form. Other examples are صَحَارَى "deserts" (origin of Sahara) and كُتَّاب "writers". In this case, رِيَاح is less obvious because you have singular كِتَاب with the same construction. In any case, if you really have something that has all the characteristics you describe, plus the fact that its form doesn't tell you whether it's singular or plural, and that fact that its meaning doesn't tell you that either, then you have no call to say something is singular or plural, that's all, and you'd have to go by what the dictionaries say or just omit it entirely. Benwing (talk) 06:52, 9 October 2014 (UTC)
What about رُمَّانٌ (rummānun)? But I think you are right about أَرْوَاحٌ (ʾarwāḥun) and صَحَارَى (ṣaḥārā). If we look to other dictionaries, then the question remains about how those dictionaries determine whether the term is plural. And then this raises another question: Why do we need to know whether it is plural? In other words, what will our readers do with this information? --WikiTiki89 07:25, 9 October 2014 (UTC)

Providing gender and plurality is important, IMO, even if it's only for the etymology. Broken plural forms seldom look like feminine singular but are used grammatically as such. If we don't provide this info, then users may ask for it, even if it doesn't make much difference for communication. If the gender or plurality is not known, it's fien to show "?" - meaning it's not known. --Anatoli T. (обсудить/вклад) 01:25, 10 October 2014 (UTC)


Shouldn't the 3mp past of this kind of verb be حَيِيُوا (ḥayiyū) rather than حَيُّوا (ḥayyū)? --WikiTiki89 16:04, 21 October 2014 (UTC)

The expected 3mp past would actually be حَيُوا (ḥayū). Take a look at رَضِيَ (raḍiya). The form حَيُّوا (ḥayyū) is explicitly given in John Mace's book on Arabic verbs. Barron's "201 Arabic Verbs" on the other hand has حَيُوا (ḥayū) without gemination; presumably one of these is a misprint. I can't find any other book that lists the full conjugation of this verb. Benwing (talk) 20:59, 21 October 2014 (UTC)
Then are you sure that the conjugation at رَضِيَ (raḍiya) is correct? It seems to me that either the 3fs past should be رَضَتْ (raḍat) or the 3mp past should be رَضِيُوا (raḍiyū), but I may be wrong. --WikiTiki89 21:54, 21 October 2014 (UTC)
I'm pretty sure the conjugation is correct. I'll take a look when I have access to my verb tables but I remember encountering this exact situation. The page on w:Arabic verbs also has this conjugation. Benwing (talk) 00:37, 22 October 2014 (UTC)
I've verified that the conjugation is correct. Something similar happens with final-weak active participles ending in -in, where the -iy- drops before u and i in masculine plural -ūna and -īna but not before a in feminine plural -iyātun/-iyātin or dual -iyāni/-iyayni. Benwing (talk) 15:33, 22 October 2014 (UTC)
In that case, I'm still confused why there is a shadda in حَيُّوا (ḥayyū); it makes sense in the conjugation of حَيَّ (ḥayya), but not in that of حَيِيَ (ḥayiya). --WikiTiki89 21:46, 25 October 2014 (UTC)

Deletion requests[edit]

Could you explain your deletion requests such as this one:

Could you confirm that the form exists, and that information provided was correct? Note that a separate page is normal for all forms of words... Lmaltier (talk) 20:57, 25 October 2014 (UTC)

There has been an agreement that the lemma form does not include a definite article unless it's an inherent part of the lemma, and that we don't include forms with added definite article unless it has some special meaning. It's similar to not including "the cat", "the dog", "the octopus", etc. as lemma entries. Benwing (talk) 05:17, 26 October 2014 (UTC)
It's not the lemma form. But is it a form of the word? In Bulgarian, forms including the definite article are actual forms of the word, just like a plural form. Is it the same? What agreement do you refer to? Lmaltier (talk) 18:45, 26 October 2014 (UTC)
The definite article is a clitic attached to the beginning of a word. In formal Classical Arabic, sometimes the ending also changes slightly, although usually without changing the unvowelled spelling under which words are entered in the dictionary. I brought this issue up in the Grease Pit I think, and asked whether these forms should generally be deleted, and there was agreement to do so. The definite form is not like the plural form in Arabic because the plural is often highly unpredictable whereas the definite is totally predictable by fairly simple rules. I think the situation is different in Bulgarian because for Bulgarian the definite isn't always predictable from the indefinite, e.g. sometimes the stress moves onto the definite. I personally think that only the few cases where the unvoweled spelling changes in the definite should be included; in all other cases the lemma can be found from the definite by simply removing the al- (Arabic ال) from the beginning of the word. Including these forms for all words would seem to clutter things up needlessly. Benwing (talk) 20:30, 26 October 2014 (UTC)
A discussion is here Wiktionary:Beer_parlour/2014/October#Category:Arabic_definitive_nouns.3F.3F.3F. It's been a general consensus not to include words with proclitic definite articles. Besides the definite article, monosyllabic prepositions, consisting of only one written consonant, question marker أَ (ʾa), enclitic pronouns are also written without a space. They don't belong to the word. It's different from Bulgarian/Macedonian, Albanian and Scandinavian languages, where these endings are considered inflections. Korean particles and copulas are the same story - written together but don't belong to words. --Anatoli T. (обсудить/вклад) 21:27, 26 October 2014 (UTC)

Arabic ǰuna[edit]

Hello. I am looking for an Arabic word transliterated as ǰuna, meaning perhaps “tanner, skin-dresser” or “hatter”. Does it exist and what is the spelling? It is needed for ճոն (čon). --Vahag (talk) 07:11, 26 October 2014 (UTC)

It would be spelled جُنَة but I can't find any such word in any of my dictionaries. I looked at a lot of variations and there are words like jauna "disc of the sun" and jūn, jūna "bay" and junāh "sinners, gatherers" but nothing meaning "tanner" or "hatter". I even checked things like junʿa, junʾa, juʿna, juʾna, junẖa, juẖna, junha, juhna on the assumption that one of these weak consonants might have been omitted in borrowing but no such luck. Benwing (talk) 21:36, 26 October 2014 (UTC)
Thanks, my source is possibly unreliable in this case. --Vahag (talk) 09:13, 27 October 2014 (UTC)

Arabic phrasebook entries[edit]


I haved fixed صَبَاح الخَيْر (ṣabāḥ al-ḵayr) and صَبَاح النُور (ṣabāḥ an-nūr) as examples of SoP entries, such as phrasebook entries. It's cumbersome to add links to individual words, though. --Anatoli T. (обсудить/вклад) 00:31, 28 October 2014 (UTC)

forte possible[edit]

Wow...I must have been half-asleep! My source doesn't even say "fp", so I'm not sure where I got that from. But it does say "forte possible" without indicating the language. Quote: "forte possible. As loud as possible." This is from "The Modern Conductor" by Elizabeth Green. The source is a trusted standard for conductors. Bob the Wikipedian (talkcontribs) 13:33, 7 November 2014 (UTC)

Oops, it says "possibile". Didn't even see the 'i' there! Bob the Wikipedian (talkcontribs) 22:01, 7 November 2014 (UTC)
OK, well then forte possibile must be a real term (although it sounds odd to me). Benwing (talk) 09:26, 8 November 2014 (UTC)

Automatic translit and entering Arabic vocalisaton[edit]


I noticed that you sometimes leave the manual transliterations, even on fully vocalised native Arabic words, why is that? Do you think it's still inaccurate, especially with tāʾ marbūṭa? Also, I'd like to share with you that I use Firefox plug-in "Character palette" to enter Arabic diacritics - highly recommended if you use Firefox. It's quite convenient and easy. :) --Anatoli T. (обсудить/вклад) 06:07, 14 November 2014 (UTC)

Yeah, it's because of the tāʾ marbūṭa, so it gets rendered properly instead of as (t). I enter Arabic diacritics using the Arabic keyboard layout on the Mac, which has almost all the necessary stuff ... just missing dagger alif and hamzat al-waṣl. These ones, along with the left and right half-rings, get entered using the built-in Mac character palette (Control-Command-Space). If I find myself using Firefox, however, I'll definitely check out the "Character palette" plug in. Benwing (talk) 23:45, 14 November 2014 (UTC)



What's the deal with ʾiʿrāb? Are we supposed to use it in headwords and translations from English? I can see both - with and without. Is it still undecided? Sorry, don't remember the outcome of discussions. --Anatoli T. (обсудить/вклад) 04:41, 18 November 2014 (UTC)

Also, marking hamzat al-waṣl is usually problematic but I see the module can handle the elision without the diacritic. Do you mark it? --Anatoli T. (обсудить/вклад) 04:42, 18 November 2014 (UTC)
I still haven't quite decided what to do about ʾiʿrāb. Mostly, I've entered words without ʾiʿrāb because it looks strange to me to include it, and most existing entries don't include it. (The main exceptions are in {{ar-nisba}}, which includes ʾiʿrāb in its auto-generated entries, and in verbal nouns for verbs, which always have ʾiʿrāb in them.) I like the solution used in Hans Wehr, which leaves triptotes unmarked and marks diptotes with a superscript 2; possibly we could adopt this solution, and I could fix Module:ar-translit to ignore a superscript 2 when transliterating. What do you think?
As for hamzat al-waṣl, you're right that the translit module can generally manage to elide it when necessary, although I've still been inserting it. I don't feel strongly about this, though, and we could choose to leave it out. Why do you think it's problematic? Benwing (talk) 08:08, 18 November 2014 (UTC)
I think we should include them, and I have been doing so. --WikiTiki89 16:37, 18 November 2014 (UTC)
I see we still have disagreement on this. Superscript 2 for diptotes is a great idea! Adding hamzat al-waṣl is no longer problematic but non-ligature form لله is not displayed correctly with any diacritic before or after or when alif is missing. --Anatoli T. (обсудить/вклад) 21:19, 18 November 2014 (UTC)
I don't like the superscript 2 idea. If we want to be explicit in headwords, we can just put the word diptote with a link to a appendix. The ʾiʿrāb will look less weird once we start graying them out. Also, if we choose not to include ʾiʿrāb, should we make an exception for words like نَادٍ (nādin)? --WikiTiki89 22:58, 18 November 2014 (UTC)
I still hesitate about ʾiʿrāb, undecided myself but if most entries and don't have them, then we won't get consistency. We won't be able to grey it out in translations or other places with automatic transliteration, will we? Hans Wehr and rare web references with vocalisation don't use them either. Terms like نَادٍ (nādin) could be exceptions.
A superscript 2 could link to a diptote appendix or About Arabic page. --Anatoli T. (обсудить/вклад) 23:28, 18 November 2014 (UTC)
Yes we can gray them out with automatic transliterations. --WikiTiki89 23:37, 18 November 2014 (UTC)
I implemented graying out ʾiʿrāb but I don't like the idea much because it can't really be done properly automatically. For example, adverbial accusatives need the ʾiʿrāb displayed normally, and in Koranic quotes we presumably want to do so as well. We also display ʾiʿrāb in verbs, among other things. I would rather display the ʾiʿrāb when it belongs in the translit and leave it out otherwise. I also think the graying out looks a bit strange in {{ar-nisba}} examples like عَرَبِيّ (ʿarabiyy). For cases like وَادٍ (wādin) we should probably make an exception and include the ʾiʿrāb; likewise for words like مُسْتَشْفًى (mustašfan). This is also the convention used in Wehr's dictionary; or at least, the translit includes the ʾiʿrāb, when it doesn't for most words. Likewise, this dictionary displays ʾiʿrāb in translit of verbs consistently, in adverbial accusatives and sometimes in phrases when it's necessary to clarify the case relations, but not otherwise. For cases like وَادٍ (wādin) I also try to put an entry at وَادِي (wādī) that says it's the construct state, given the way these words are normally pronounced.
As for displaying the word diptote, this isn't a bad idea although the problem is that it can't so easily be done for plural inflections, which are the most common cases of diptotes (well, I suppose it could, with some hacking of Module:headword, although it's not clear whether it will look bad). Benwing (talk) 23:48, 18 November 2014 (UTC)
Plural inflections don't need to say that because they are a regular part of the grammar. It is only words that are lexically diptotes, such as مِصْرُ (miṣru) that need an indication. I don't think we should base everything off of Hans Wehr. Note that there is no logical reason why ʾiʿrāb should be included for verbs, but not for nouns. Also note that it is much easier for an Arabic beginner to remove the ʾiʿrāb than to add it. --WikiTiki89 00:06, 19 November 2014 (UTC)
I don't think it would be easy in adding ʾiʿrāb to all entries consistently in practical terms, unless someone commits to make a bot to do this. Re: it is much easier for an Arabic beginner to remove the ʾiʿrāb than to add it. Yes, totally, that's the main pro argument. --Anatoli T. (обсудить/вклад) 02:00, 19 November 2014 (UTC)
If we decide to do it, then we can worry about how to do it. But it shouldn't be too hard anyway. Arabic doesn't have nearly as many entries as English or Russian, for example. --WikiTiki89 02:04, 19 November 2014 (UTC)
Why don't plural inflections need it? Some broken plurals are diptotes, some are triptotes. For example, 4-character plurals of the form CaCāCiC and CaCāCīC are generally diptotes (including words like فَوَاكِه and جرَائِد which are based off of 3-character singulars), as are plurals of the form ʾaCCiCāʾ (and ʾaCiCCāʾ for geminate roots) and CuCaCāʾ, and words of the form CaCCān, generally intensive adjectives (but not words of the form CiCCān or CuCCān), and masculine color/defect/elatives of the form ʾaCCaC (and ʾaCaCC for geminate roots) and feminine color/defect adjectives of the form CaCCāʾ, and probably other cases as well. This is independent of the predictable declension of words in -ūn, -āt and -in, which are technically diptotes because they have only two distinct case forms but which have their own declensions separate from the normal diptote declension.
As for following Hans Wehr or not, John Mace's book on Arabic Verbs likewise includes ʾiʿrāb for verbs but not nouns and adjectives (including verbal nouns and participles), and the book "Introduction to Koranic and Classical Arabic" by Thackston does something similar, where verbs are transcribed with ʾiʿrāb as e.g. rajaʿa "to return" but nouns and adjectives are written with ʾiʿrāb only if they are diptotes, e.g. ğarīb- pl. ğurabāʾu "strange" (with a hyphen in place of the triptote ending -un). So I think there's a lot of precedent for something like this. I'm not opposed to the idea of writing ʾiʿrāb only for diptotes, as Thackston does; this would be an alternative to using a superscript 2. All of these books are likewise consistent in writing ʾiʿrāb for prepositions and particles, which I think is a good idea. I imagine one reason for this is that spoken MSA may be more likely to drop the case endings of nouns and adjectives than the ʾiʿrāb of verbs. Benwing (talk) 05:42, 19 November 2014 (UTC)
Now I'm learning something new. I thought that CaCāCiC/CaCāCīC and the others you've listed were ordinary triptotes. I thought that only sound plurals were diptotes. I'm assuming that the patterns you mention use the same declension as the sound -āt plural (i.e. -u(n) for nominative and -i(n) for genitive and accusative)? But in that case, indicating them with ʾiʿrāb is probably not a good idea. Maybe we should just explicitly indicate the accusative case of diptotes in the headword line. --WikiTiki89 13:33, 19 November 2014 (UTC)
They don't use the same declension as the sound -āt plural. They have indefinite nominative -u (no nunation), indefinite genitive/accusative -a, while the definite uses -u/-i/-a like for triptotes. This is the same declension as diptotes like مِصْرُ (miṣru) and أَحْمَدُ (ʾaḥmadu). This is all documented in w:Arabic nouns and adjectives. I imagine few Arabic speakers actually know these rules nowadays. Benwing (talk) 21:21, 20 November 2014 (UTC)
Thanks! That explains my previous confusion about words like أَحْمَدُ (ʾaḥmadu). --WikiTiki89 15:39, 21 November 2014 (UTC)

I also have "Introduction to Koranic and Classical Arabic" with answer keys :). --Anatoli T. (обсудить/вклад) 05:52, 19 November 2014 (UTC)

Arabic vowels and consonants[edit]

If I'm not mistaken, Arabic normally writes only consonants, but three of the consonant letters can also be used to indicate long vowels. Assuming that the word is fully vocalised, I wonder if there is a reliable way to tell whether a given consonant represents an actual vowel or its consonantal equivalent? I am asking this because I would like to write a function that extracts the consonants or vowels from a word. This means knowing which letters are vowels and which are consonants, obviously. —CodeCat 19:08, 25 November 2014 (UTC)

For ي and و, if there is another short vowel written on them, then they are consonants, otherwise they are long vowels. In the case there is a sukuun (the null vowel) on them, it is debatable whether to analyze them as a the second element of a diphthong or just as a consonant. For ا, the situation is a bit trickier. It almost always indicates a long vowel, but at the beginning of a word, it indicates an elidable epenthetic vowel before a consonant cluster. The tricky part is that if there is a prefix, the ا is still written but represents no sound at all (كَاسْمٍ (kasmin, like a name)) rather than a long vowel. This can be detected only by knowing that consonant clusters are forbidden after long vowels (except in the active participle of geminate roots, e.g. خَاصٌّ (ḵāṣṣun), but these have the very particular form C1āC2C2-). But I'm curious as to why you're writing this function. It may or may not be important to keep in mind that long vowels can interchange with semivowels within the same consonantal root (e.g. نُونٌ (nūnun) > تَنْوِينٌ (tanwīnun)). --WikiTiki89 19:33, 25 November 2014 (UTC)
Re: why you're writing this function: pls, see Module_talk:ar-headword#Plural_forms.2C_dual_forms.2C_etc. --Anatoli T. (обсудить/вклад) 21:31, 25 November 2014 (UTC)
One conceivable way is to use Module:ar-translit and then parse the transliteration. This already implements all the rules required to distinguish consonants from vowels. (Except that it doesn't handle cases like كَاسْمٍ (kasmin, like a name) but these won't show up in single lemmas -- this occurs only because ka- "like" is a clitic.) I don't know whether this is doable in reality, as you'd have to map back to the Arabic text somehow. If not then you should at least be able to reuse the code in Module:ar-translit that does the transliteration. Benwing (talk) 05:23, 26 November 2014 (UTC)

Gender and nmber of adjectives[edit]


Re: diff. Normally, adjectives (lemmas) don't display gender in any language in the headword. If masculine singular is the lemma, so it's used as lemma, other forms use form templates. --Anatoli T. (обсудить/вклад) 07:19, 28 November 2014 (UTC)

The template is for non-lemma plural adjective forms, e.g. أَغْبِيَاء (ʾaḡbiyāʾ), which is the masculine plural of adjective غَبِيّ (ḡabiyy), and the change is made to reflect the fact that the most common usage will be with masculine (broken) plurals. For non-lemma forms it seems reasonable to display their gender. Benwing (talk) 10:35, 28 November 2014 (UTC)


Hi, What's the gender/number of حِذَاء (ḥiḏāʾ)? Hans Wehr says "(pair of) leather boots or shoes", plural أَحْذِيَة (ʾaḥḏiya). --Anatoli T. (обсудить/вклад) 01:42, 29 November 2014 (UTC)

Hmmm, it's singular, I'm guessing masculine. Words ending in اء are often feminine but in this case it's not an ending but rather a form of the root consonant و, meaning that the word has the same pattern as كِتَاب. Benwing (talk) 05:09, 29 November 2014 (UTC)

A few WingerBot clinkers[edit]

ريالات, فرميونات, كامرات, كواركات, and ميكروبات all have the same module error. Chuck Entz (talk) 06:39, 30 November 2014 (UTC)

Thanks. Fixed them. Happened when the singular noun entry had a blank head. Benwing (talk) 08:38, 30 November 2014 (UTC)

ذكرى and دنيا[edit]

Just curious etymologically, do you know why ذِكْرَى (ḏikrā) and دُنْيَا (dunyā) don't take tanween (i.e. why aren't they ذِكْرًى (ḏikran) and دُنْيًا (dunyan))? --WikiTiki89 16:57, 12 December 2014 (UTC)

دُنْيَا (dunyā) is a nominalized feminine elative of دَنِيّ (daniyy, low), literally "the lowest (place)". It has the same pattern as كُبْرَى (kubrā), feminine of أَكْبَر (ʾakbar). It takes tall alif by the rule that alif maqṣūra is written as tall alif after yāʾ. I don't know what the proto-forms of these words are, nor of ذِكْرَى (ḏikrā), but I guess the reason for no tanween is that these words are underlyingly diptotes. Similarly, the masculine elative of دَنِيّ (daniyy, low) is أَدْنَى (ʾadnā) without tanween, underlyingly *ʾadnayu (cf. ʾakbaru) whereas a word like مَعْنًى (maʿnan, meaning) is underlyingly *maʿnayun (cf. maktabun). I'm not sure the reason why ذِكْرَى (ḏikrā) is a diptote. I'm also not sure why certain words like عَصًا (ʿaṣan, stick) have tall alif independently of a preceding yāʾ. (In the dialect of Mecca, alif maqṣūra was pronounced something like [e:] whereas tall alif was [a:].) Benwing (talk) 00:51, 13 December 2014 (UTC)
I would guess that عَصًا (ʿaṣan) is underlyingly *ʿaṣawun, not **ʿaṣayun, which is why it has tall alif. You answered my question with "these words are underlyingly diptotes", "دُنْيَا (dunyā) is a nominalized feminine elative of دَنِيّ (daniyy, low)", and "I'm not sure the reason why ذِكْرَى (ḏikrā) is a diptote". Thanks! --WikiTiki89 02:47, 13 December 2014 (UTC)
Good call on عَصًا (ʿaṣan). Benwing (talk) 08:13, 13 December 2014 (UTC)


What's the transliteration of صري, if the term is real? --Lo Ximiendo (talk) 16:09, 16 December 2014 (UTC)

I don't think this word actually exists. It's not in my dictionary. Benwing (talk) 18:58, 16 December 2014 (UTC)
The closest in Wehr appears to be مُصِرّ (muṣirr, persistent, resolute). Lane has a noun something like صِرِّي (ṣirrī) or صِرِّيّ (ṣirriyy) (?) meaning something like "a serious assertion, not a jest", occurring in the expression هِيَ مِنِّي صِرِّي (hiya minnī ṣirrī) in various variants meaning approximately "It is a serious assertion from me" said of an oath. It's clearly an archaic word which is why it isn't in Wehr. I still think we should delete this word. Benwing (talk) 19:10, 16 December 2014 (UTC)
Then what's the transliteration of حماقة? --Lo Ximiendo (talk) 21:58, 17 December 2014 (UTC)
@Lo Ximiendo: fixed. --Anatoli T. (обсудить/вклад) 22:14, 17 December 2014 (UTC)
Can anyone verify the term مندثر and its transliteration? --Lo Ximiendo (talk) 12:15, 18 December 2014 (UTC)
Added translit. Benwing (talk) 11:24, 19 December 2014 (UTC)

Arabic for "to fry"[edit]

Is the following term قلا really an Arabic verb that means "to fry"? --Lo Ximiendo (talk) 11:21, 16 January 2015 (UTC)

@Lo Ximiendo: Yes it is. I cleaned up the entry and added conjugation. Benwing (talk) 07:11, 17 January 2015 (UTC)

شجرة التفاح and كوكا كولا[edit]

I'm not sure whether the module errors on these entries are your fault- but I'm pretty sure you can fix whatever the problem is. Thanks! Chuck Entz (talk) 22:22, 18 January 2015 (UTC)

Fixed. Benwing (talk) 01:34, 19 January 2015 (UTC)
Since كوكا كولا is indeclinable like some loanwords and many words ending in alif, so may be the inflection table is unnecessary but the header should say so + indeclinable category? --Anatoli T. (обсудить/вклад) 01:39, 19 January 2015 (UTC)
A Russian indeclinable example бюро́ (bjuró). A parameter in the header is "-" adds to Category:Russian indeclinable nouns. Just a suggestion, it may reduce the editing time. --Anatoli T. (обсудить/вклад) 02:02, 19 January 2015 (UTC)

Requested entry[edit]

Hi, are you able to make entry for مُفْتٍ (muftin), please? I'm not sure about the plural form and don't have my HW handy. :) --Anatoli T. (обсудить/вклад) 05:59, 19 January 2015 (UTC)


Please take a look at Category:Pages with module errors (currently 55 entries) and fix the problem. Thanks! Chuck Entz (talk) 14:45, 21 January 2015 (UTC)

Sorry about that. Stupid typo. I wish the stuff in Category:Pages with module errors showed up faster. It seems to take quite awhile for it to cycle through, longer than it used to. E.g. I fixed the error 20 minutes ago and still see all the pages listed. Benwing (talk) 07:41, 22 January 2015 (UTC)
I just do null edits on all the entries. If you open a bunch of them in separate tabs, you can do things with the first ones you opened while the more recent ones are still getting around to responding, and keep doing each step that way until you're done- it averages out to only a second or two per entry on a reasonably fast computer. All those errors are cleared, but there's a single one with a new error. Chuck Entz (talk) 03:54, 23 January 2015 (UTC)



Do you know why شْنِیتْزَل is not working? All internal diacritics are there, although I'm not 100% sure it's a sukūn or kasra after šīn. It should probably be manually transliterated as "š(i)nitzal" but the automatic test fails. --Anatoli T. (обсудить/вклад) 00:06, 6 February 2015 (UTC)

You had FARSI YEH in place of the YEH. If you change that, you get شْنِيتْزَل (šnītzal) and it works. Benwing (talk) 04:00, 6 February 2015 (UTC)
Oops. Thank you. :) I wonder how I managed to get it there... --Anatoli T. (обсудить/вклад) 04:04, 6 February 2015 (UTC)


Hi BW,

Why is the entry reporting "Arabic nouns with sound masculine plural" when it is a broken plural and not -ūn(a)/-īn(a)? Did I miss something or is it confusing سُجُون for an SMP? TIA. :) --Anatoli T. (обсудить/вклад) 13:02, 5 March 2015 (UTC)

It thinks the -ūn ending indicates a strong plural. I fixed it by adding an explicit ":tri" (triptote) notation. This also occurs for a few other words ending in -n, e.g. عَيْن pl. عُيُون and قَرْن pl. قُرُون. Possibly I should add a check for the form فُعُون and treat it as a broken plural. Benwing (talk) 14:39, 5 March 2015 (UTC)

Arabic etyma of Swahili terms[edit]

I don't have any Arabic resources nor do I know Arabic script, so I have been having a hard time finding the etyma of Swahili words that I've been adding that (to me, at least) seem very strongly like they derive from Arabic. I was wondering whether you'd be willing to help me out with finding the Arabic origins of words in Category:Swahili entries needing etymology, or at least recommend a good online resource for Arabic that I can search in Latin script. —Μετάknowledgediscuss/deeds 21:50, 9 March 2015 (UTC)

I can try to help you. There are a whole host of dictionaries here: [2] and you can search by Latin using the search button, although they tend to be sorted by Arabic root, which requires that you have some knowledge of how Arabic words are structured, because Arabic roots are generally three consonants, with the vowels inserted between them. I don't see too many Arabic-looking words among the category you linked to above, although maskini definitely comes from Arabic مِسْكِين (miskīn). Benwing (talk) 07:13, 10 March 2015 (UTC)
aidha is from أَيْضًا (ʾayḍan). Benwing (talk) 07:18, 10 March 2015 (UTC)
sahani is from صَحْن (ṣaḥn). Benwing (talk) 07:44, 10 March 2015 (UTC)
Thank you! There are also some entries, at least one of which I see you have noticed, that need some improving of their etymologies, like tajiri.
I'm pretty sure the following words in that category are from Arabic or via Arabic: ghasia, hafifu, hodari, imara, karamu, laini, ruhusa, shikamoo. If any of those have deducible etyma, that would be very helpful. I'll try the dictionary you linked to. —Μετάknowledgediscuss/deeds 08:05, 10 March 2015 (UTC)
hafifu is possibly خَفِيف (ḵafīf, light, slight, thin). ghasia is possibly غَاشِيَة (ḡāšiya, misfortune, faint, stupor, attendants) (?). Can't find any obvious etyma for hodari or karamu. imara is possibly إِمَارَة (ʾimāra, emirate, authority, power), although this is a noun not an adjective, and the power talked about is power of command rather than physical power. laini is possibly لَيِّن (layyin, soft, feeble, tender, gentle, supple). ruhusa is definitely رُخْصَة (ruḵṣa, permission). shikamoo I have no idea about. Benwing (talk) 09:06, 10 March 2015 (UTC)
Most of those match the regular sound changes, but imara seems off and ghasia would have turned out as *ghashia if that were the etymon, unless there is a dialectal form in Arabic with /s/. Thank you for all your trouble! —Μετάknowledgediscuss/deeds 17:02, 10 March 2015 (UTC)
You're welcome! Sorry I couldn't find better etyma. As for ghasia, Arabic is pretty strict about keeping /s/ and /ʃ/ apart so I don't think there are any dialectal forms with /ʃ/ -> /s/ in them. Benwing (talk) 18:39, 10 March 2015 (UTC)
I got my hands on some better Swahili resources, including an etymological dictionary. That said, I may have to learn Arabic script if I want them to be of any use to me in this regard. —Μετάknowledgediscuss/deeds 08:05, 11 March 2015 (UTC)
If you put up some screen shots I might be able to help. Benwing (talk) 23:08, 11 March 2015 (UTC)
Once I'm not terribly busy, I'll learn Arabic script so I don't have to be as reliant. Perhaps next week. —Μετάknowledgediscuss/deeds 07:46, 13 March 2015 (UTC)


Apparently the module disagrees with the conjugation type you gave it. Chuck Entz (talk) 08:57, 29 March 2015 (UTC)

Thanks. The module was correct; I fixed the conjugation type. Benwing (talk) 09:27, 29 March 2015 (UTC)

Automatic transliteration of بالـ[edit]

I thought this worked before, but I may be wrong: بِالتَّوْفِيق (bi-t-tawfīq). --WikiTiki89 15:26, 15 May 2015 (UTC)

It seems to work if you change the alif into an alif waṣla. I don't think I ever got it working in the case you give. Benwing (talk) 06:46, 16 May 2015 (UTC)
Yes, Module:ar-translit/testcases has a case with an ʾalif waṣla - بِٱلتَّأْكِيد (bi-t-taʾkīd). --Anatoli T. (обсудить/вклад) 09:22, 16 May 2015 (UTC)
Ok. That's probably why I remember it working. --WikiTiki89 19:01, 19 May 2015 (UTC)
Considering that "ٱ" is such a rare symbol, perhaps the rule should be that if "ال" follows a kasra or a ḍamma, then it should be considered an ʾalif waṣla? I don't know if it's hard to implement for Benwing. Shall I add بِالتَّوْفِيق (bi-t-tawfīq) (or similar) to test cases? --Anatoli T. (обсудить/вклад) 01:22, 21 May 2015 (UTC)
OK, I implemented this. 19:58, 21 May 2015 (UTC)
Cool! Maybe we should have it work for مِائَة as well? Or is that too risky and not common enough to be beneficial? --WikiTiki89 21:31, 21 May 2015 (UTC)
I think that's too much work for just one single case, and every new regex slows things down and risks leading to module errors on certain long appendix pages. Benwing (talk) 23:02, 21 May 2015 (UTC)
Well I was just thinking of making the regex you just added less restrictive (i.e. changing {"([\217\143\217\144])\216\167\217\132", "%1\217\177\217\132"}, to {"([\217\143\217\144])\216\167", "%1\217\177"},), but like I said, that might be too risky and not worth it for such a rare case. --WikiTiki89 17:15, 22 May 2015 (UTC)

Parameters of Arabic headword-line templates[edit]

I've been trying to figure out how the Arabic headword-line templates work and what parameters they take. It seems to me that many of them show a rather excessive number of forms on the headword line. For example, {{ar-noun}} apparently can list:

  1. construct state
  2. definite state
  3. oblique
  4. informal
  5. dual
  6. dual construct state
  7. dual definite state
  8. dual oblique
  9. dual informal
  10. plural
  11. plural construct state
  12. plural definite state
  13. plural oblique
  14. plural informal
  15. feminine
  16. feminine construct state
  17. feminine definite state
  18. feminine oblique
  19. feminine informal
  20. masculine
  21. masculine construct state
  22. masculine definite state
  23. masculine oblique
  24. masculine informal

This really is way way way too many forms to list on a single headword line. These templates should be trimmed down to only show the bare basics and the rest should be shown in an inflection table. —CodeCat 18:38, 19 May 2015 (UTC)

Yeah, it's a lot of potential forms, but most of them aren't used. In practice only the forms that can't be predicted are listed, and that's a small number. At least, that's the practice I've been following, and it was more or less the same in the existing entries before I came along, so it's pretty consistent. Generally, for nouns, only the plural is given; dual is given only when it can't be predicted, which is fairly rare (basically, only nouns ending in -ā, where the dual can be either -awān or -ayān or sometimes both). Masculine is used only for feminine nouns referring to people, where there is a corresponding masculine noun. Construct state is given only for nouns ending in -in (which can appear in the singular or broken plural), and informal is similarly given for adjectives in -in (which can likewise appear in the singular or broken masculine plural; adjectives don't have a construct state). The reason for this is the -in is written with a diacritic ـٍ (two slanted lines below the letter), and hence doesn't appear in unvocalized text or in the unvocalized page title; whereas the construct state, informal and definite all appear with -ī, written with an extra letter ي. Giving both forms emphasizes and clarifies the relation between the two, esp. since many users may be more familiar with the version with attached ي. Overall, there are typically only a couple of forms listed in the headword line, and if there are more it's usually because there are multiple broken plurals (in the extreme case, رَاجِل (rājil, pedestrian, footsoldier) has 13!).
This means that at least the following could potentially be removed:
  1. oblique (always predictable)
  2. definite state
  3. dual construct and informal
  4. feminine construct and informal

Benwing (talk) 01:06, 20 May 2015 (UTC)

The feminine and masculine forms probably have their own lemma page don't they? If so, then we don't need to list all the forms of them as they'll already be covered on that other page. As for the rest, I don't think we should be showing all possibly-unpredictable forms in the headword line. The idea of the headword line is to give a quick overview of the inflection, but listing all the irregularities is just too much. Consider for example what would happen if we tried that for Latin deus! So we really need to make a choice: which forms are the most essential and least predictable? Forms that are only unpredictable for a handful of words don't need to be in the headword line, that's what inflection tables are for. —CodeCat 01:12, 20 May 2015 (UTC)
How about you find an example of Arabic entry that has too many forms in the headword line (other than plurals), and then you can complain. Arabic doesn't have words that are as irregular as Latin deus. --WikiTiki89 12:59, 20 May 2015 (UTC)
I agree with Wikitiki; in practice this isn't really a big issue. As for feminine and masculine forms, in the case of nouns yes they have their own lemma page, but in this case we don't list the forms of them. Feminine plurals are only given for adjectives and even then only sometimes, generally when the dictionary gives them (which is only for adjectives that can modify people and generally only when the feminine plural is irregular). I'd say, things aren't broke so let's not try to fix them. Benwing (talk) 03:47, 21 May 2015 (UTC)
BTW what is the need for your latest changes to Module:ar-headword? Why the need to explicitly list def/def2/def3/def4 etc.? This seems very hacky, and something similar won't work for plurals, where there may well be more than 4 possibilities. The current code works fine without needing to do any of this. Benwing (talk) 03:53, 21 May 2015 (UTC)