Wiktionary talk:About Arabic

From Wiktionary, the free dictionary
Jump to navigation Jump to search

See also Appendix:Arabic script

Arabic words—organizing by root—proposal[edit]

The English-Arabic dictionary section has the potential to be a very useful section for English speaking students of Arabic. However, the fact that virtually all Arabic words are based on a three (very rarely 4) letter root, with standard prefixes, suffixes and infixes, presents unique problems for organizing an English dictionary of Arabic. Simply organizing the dictionary alphabetically would be unwieldy and difficult to use; when looking up an Arabic word, one typically identifies the 3 root letters, then the "form" of the verb it is associated with (there are 10 common forms) and looks up the entry alphabetically by the root letters, to find the definition.

The advantage here is that all related words are grouped together instead of being spread throughout the dictionary. Also, if a dictionary were not organized by root, most words would begin with one of three letters: the equivalent of "Y", "M" or a glottal stop.

I propose the following variation, then, to the standard Wiktionary word page, for Arabic words:

Word This would be the entire word, which could still be searched for directly, without deciphering the root letters, for instance منظمة

Arabic Language, as per wiki normal

ROOT in the above example, this would be ن ظ م without the prefix "m" letter and the suffixed "a" sound.

PATTERN NUMBER The above word is form II, or as Arab dictionaries describe it: wazn فعّل

Part of Speech —Here I don't know whether it makes sense to use English grammar terminology, which only loosely describes Arabic grammar functions: I suggest it would be helpful to also include the Arabic grammar terms (masdar, etc.).

pronunciation 'munáthama' Definition. 'organization' References etc...

--Jackbrown 13:51, 4 January 2006 (UTC)[reply]

To make this proposal simpler to implement, I’ve added sample code at Category talk:Arabic language, the Talk: Arabic Language page, which can be EDIT:Copy:Pasted into new word definition pages. All and sundry should feel free to improve both on this idea and the sample page; I really meant it as a proposal to be discussed. --Jackbrown 12:17, 6 January 2006 (UTC)[reply]

Arabic verb form templates[edit]

Please comment. For a usage example see خ د ر.

Arabic root entry[edit]

Use template {{ar-root-entry}} to get the root along with a transcription into IPA and another way of transcription.

Template {{Arab}}[edit]

The article should have a short section expounding the use of {{Arab}} for proper display of Arabic script. --EncycloPetey 21:23, 3 February 2007 (UTC)[reply]

Arabic words - organizing by root - proposal (Tranfered form: Category talk:Arabic language)[edit]

The English-Arabic dictionary section has the potential to be a very useful section for English speaking students of Arabic. However, the fact that virtually all Arabic words are based on a three (very rarely 4) letter root, with standard prefixes, suffixes and infixes, presents unique problems for organizing an English dictionary of Arabic. Simply organizing the dictionary alphabetically would be unwieldy and difficult to use; when looking up an arabic word, one typically identifies the 3 root letters, then the "form" of the verb it is associated with (there are 10 common forms) and looks up the entry alphabetically by the root letters, to find the definition.

The advantage here is that all related words are grouped together instead of being spread throughout the dictionary. Also, if a dictionary is not organized by root, most words would begin with one of three letters: the equivalent of "Y", "M" or a glottal stop.

I propose the following variation, then, to the standard Wiktionary word page, for arabic words:

Word This would be the entire word, which could still be searched for directly, without deciphering the root letters, for instance منظمة


Arabic Language, as per wiki normal


ROOT in the above example, this would be ن ظ م without the prefix "m" letter and the suffixed "a" sound.


PATTERN NUMBER The above word is form II, or as arab dictionaries describe it:wazn فعّل


Part of Speech --here I don't know whether it makes sense to use English grammar terminology, which only loosely describes arabic grammar functions, or whether it would make more sense to also include the arabic grammar terms (masdar, etc).

pronunciation 'munathama' Definition. 'organization' References etc...


The advantages of this minor variation on the normal definition page will be obvious to students of arabic I think.

>>So my first question, then, is the following: on Wiktionary, how do we go about imposing a fairly radical change in organization of one part of the dictionary? In other words getting people to include two extra indexes (root and pattern) to the arabic words they enter...

>>And second: are we really supposed to hand code every word definition, then rewrite two or three other pages (the front end of the Arabic-English dictionary, etc) to link to every word we enter? Or is there some slightly more automated process for entering and linking to word definitions? --Jackbrown 13:51, 4 January 2006 (UTC)[reply]

Sample Code here - you can "edit" this page (use the tab at the top of the page, not the ones below), then cut and paste the code below to make a template for your new word definitions, filling in the appropriate headlines -


Actually I agree with Jack about this way of organization , it is the best way to make the English-speaking students feel the sense of Arabic language and the great capability for derivation and relationship between the related words . We have firstly to make good list of Arabic words arranged by the first letter , then these words should also categorized according to their root . maby we can make later 2 indexes : one by first letter , and other by root --Chaos 12:13, 11 January 2006 (UTC)[reply]

I suggest doing both (the root page and the word level page), because sooner we will have other pages trying to link to specific derivations and vocalizations of the same root. To give an example, I created the page عبد, when I found out that the entry slave doesn't have a link to the Arabic word but then I thought it would be less confusing to link direclty to عَبْد, which I then created. I think both can serve different complementary purposes. Interlinking and categorization can improve things. Thoughts?.--Hakeem.gadi 09:25, 30 April 2008 (UTC)[reply]

I believe that having a separate page for the root, which lists all the derivative words linking to their own pages. In this case the root should be written in isolated letter forms (e.g.ن ظ م ) for the the reason that many existing words would look exactly like their roots.Hakeem.gadi 06:23, 16 May 2008 (UTC)[reply]

While I know next to nothing about Arabic, I do have a small understanding of Semitic languages through some study of Hebrew. I think that organizing words by the tri-letter roots is an excellent idea. I would suggest considering an approach similar to what Hebrew is doing. If you take a look at Category:Hebrew roots, you'll see some of their roots. The root pages can be probably be formatted in a similar fashion to hypothetical language entries, such as Appendix:Proto-Indo-European *ph₂tḗr, with a brief definitional note and perhaps a further etymology, and a list of all words using that root (probably organized in some intuitive way). Then, you just put a link to the root in the etymology, and you're all set. You can keep the nice organization, which is specific to Arabic, while still conforming to Wiktionary formatting conventions. Because I gotta tell you, trying to go against formatting conventions is an arduous uphill battle (and with good reason too, there are a great many benefits to standardization across languages). Also, it might be worthwhile to move this convo to Wiktionary:About Arabic, as that's really where it belongs. -Atelaes λάλει ἐμοί 06:35, 16 May 2008 (UTC)[reply]

I think something should be changed about the ordering of the roots. Some arabic roots are called geminate roots, meaning that the second and the third consonants are the same. However they are written with just two letters, with the diacritic shadda above the second letter to indicate that it is actually doubled. That puts those roots before all other roots starting with those two letters of course, and that is how it is in each and every arabic dictionary I know. In my opinion, our Arabic Roots category would be better if it was organized that way. -Beru7

In the first place, Hans Wehr’s Arabic-English Dictionary sorts doubled verbs as though the root has only two radicals. Second, the only place where this will make any difference will be the indexes (currently almost nonexistant), and the biggest problem with them is just getting the words into them according to the root, something that is probably going to be very labor-intensive. Once the words find their way into the proper index, a regular Unicode sort such as used by Hans Wehr should be more than sufficient. —Stephen 19:26, 24 March 2009 (UTC)[reply]
Thanks for your answer Stephen. The index exists already and it is wrong, see Category:Arabic roots. The problem is that the geminate roots that have been created before are not written the way I have described. For example: ء ل ل should in fact be ء ل. I just want to make sure we agree ! I intend to create more entries and write the geminate roots with just two letters if there is no objection. -Beru7
Category:Arabic roots is not the index. The index is at Index:Arabic. Category:Arabic roots is just a category, and it is roots, not words. If you go to ء ل ل, you will find listed some of the words that have this root, but spelt in the normal way with only one ل. For all geminate radicals, please include both of them when using {{ar-root-entry}}. See how we use this template in the etymology section of أل. —Stephen 23:33, 24 March 2009 (UTC)[reply]
It occurs to me that you may be confusing Category:Arabic roots and Index:Arabic with Category:Arabic verbs. If you want to create pages for doubled verbs, they will appear in Category:Arabic verbs. —Stephen 00:04, 25 March 2009 (UTC)[reply]
Category:Arabic roots is still an index even if it does not bear the name, or am I wrong ? And I am definitely not talking about verbs. Verbs cannot be doubled as such, only based on a doubled root. Some roots have no verbs derivating from them. Anyway, for now, I will follow what has been done already, as you suggest. About the ar-root-entry template, it is convenient but will not work for bi- and quadri-literal roots. Thank you ! Beru7 11:13, 25 March 2009 (UTC)[reply]
We don’t call category lists "indices"...only Index:Arabic. Yes, unfortunately {{ar-root-entry}} only works for triliteral roots. Other roots have to be formatted manually, or another template could be made to accomodate them. The Category:Arabic roots is a fairly new creation and most of the older pages (that is, most of the Arabic pages) have not been coordinated with it yet.
We also have {{ar-verb-fa3ala}} and similar verb templates which you might be able to use. I have found that they don’t always work correctly, especially if there is a weak radical (such as final ى). So these templates need a little more work. For prepositions and conjunctions with bound pronouns, we have {{ar-prep-inflection}} (e.g., عن).
It is especially useful to include translation tables as I did in كتب. —Stephen 11:38, 25 March 2009 (UTC)[reply]
Stephen, thanks a lot for your help. Being new here I have a few more questions. What are the translation tables your are talking about ? Do you mean putting all the derived forms on the same page ? My humble opinion is that derived forms belong on the root page, not on the form I verb page. If we put the I-XV derived forms on the form I page then shouldn't we put the ism-alfaaʕil (active participle) there ? and the ism-almafʕuul (passive participle) ? and so forth and so on. And we end up with a page which is the same as the root page ! Wouldn't it be more logical if each derived word had its own page, including Form I verb, with the root page having links to all those words that derive from it ? We could then organize the root page the same way as the entries are organized in a dictionary: first form I verb (if it exists), followed by all the words derived from it, then form II and its derived words, etc... —Berenger
Oops, sorry, I misspoke. I meant conjugation tables. (Too long without sleep!) I agree, putting all of the verb forms on the Form I page is problematic. Almost all of that was done before we had the {{ar-root-entry}} and the Category:Arabic roots. I still believe all of the forms should go on the Form I page, but perhaps it would be better if they were put under ====Related terms====. Then, of course, they would also be on the root page. And yes, the participles and verbal nouns should also appear under ====Related terms====. —Stephen 10:28, 26 March 2009 (UTC)[reply]
Ok, conjugations tables are good. Maybe we could have a template for those and hide them by default (just like the conjugations in french). I'd like to sum all that up for future contributors but I'm not sure what the right place is for that. Wiktionary:About_Arabic I'd guess ? Also can you point to an example where the verb template is wrong, I'll try to fix it. (I had a look at م_ش_ي and it looks fine) Beru7 13:32, 26 March 2009 (UTC)[reply]
Yes, a template would be great, but I only know how to make very simple ones. I don’t know the wiki code very well. And yes, the place for it is Wiktionary:About Arabic.
The verb templates that I sometimes have a problem with are the ones like {{ar-verb-fa3ala}}. I don’t remember now which of the templates it was or which verb I was working with. Next time I have that trouble, I must write it down somewhere. —Stephen 01:31, 28 March 2009 (UTC)[reply]

Guidelines[edit]

Following discussions above, I have written the following that could be put on Wiktionary:About_Arabic, please review and discuss (I'm especially looking for opinions about the non-vocalization of page names, pointing out mistakes in the english text is also welcome, english is not my native language!):

Guidelines for Arabic entries:

Arabic roots: Each Arabic root has its own entry, such as خ د ر. The root entry should explain the meanings of the root and list links to the words that derive from the root. The {{ar-root-entry}} template should be used and Category:Arabic roots appended. Ideally the order for those words should be as follows: First the Form I verb (if it exists), followed by all the nouns, participles and other words deriving from it, in alphabetical order, then form II if it exists and so forth and so on. This follows the way most Arabic dictionaries are organized. Please use the inflection templates.

Arabic words: Page names are not vocalized, meaning, for example, that form I and form II will be on the same page. Initial hamzas should not be written in the page title, but a page with the hamza should be created, redirecting to the main page for the word. However words within the page should all be written with all hamzas and proper diacritics showing vowelization. Each word should have a link to its root page (using {{ar-root-entry}}) under the Etymology section, if applicable (many loan words, for example, do not have roots).

Arabic nouns: Nouns should be presented with no case endings. Please include gender, number and gender and number inflections where applicable.

Arabic verbs: Please use the {{ar-root-entry}} template. Links to words derived from the verb should be under ====Related terms====, and conjugation tables for the verb should be included on the page.

-- Beru7 18:35, 27 March 2009 (UTC)[reply]

It’s pretty good. I made a few minor changes. I don’t know if you have noticed yet, but Arabic vowel points are a problem here. Wiktionary only allows simple vowel points, and if you put a double point, such as shadda-fatha, it won’t appear correctly. That’s because Wiktionary automatically changes the order when you save it, to fatha-shadda. It is possible to have double points if you type them this way: كتبتُن& #x0651;& #x064E;, which makes كتبتُنَّ. Otherwise, you get كتبتُنَّ. Possibly this might be made easier by using templates.
Besides vocalization, there is the matter of initial hamza. Most words that have initial hamza are usually written without a hamza, and in order to make it easier to find a word by pasting it into the search box, I have usually put page names without initial hamza (e.g., اب). However, a lot of people think that if a word has initial hamza, strictly speaking, then it should be in the page name. I have not made an issue of this, and when someone makes a page with initial hamza, I leave it that way. But it really should be subject to a policy, one way or the other. —Stephen 01:59, 28 March 2009 (UTC)[reply]
I prefer to use alifs with hamza, if the word has it, first, because that's the way they appear in dictionaries and they are more commonly written with hamza on the web. Perhaps a redirect page could be used to enable search for words without hamza in the search box. I would perhaps do the same for final alifs where it follows tanween fatha, e.g. احيانا redirect to أحياناً Anatoli 14:34, 28 March 2009 (UTC)[reply]
Stephen, just like Anatoli, I think strong initial hamzas should be written, as they are in most books and newspapers. I understand your cut'n'paste point, but then very often ى is written for ي (or the reverse as I recently saw in an article in el ahram !). Maybe we could use redirects as Anatoli suggests. However, Anatoli, tanwin is a vocalization mark and is usually not written (whereas hamza is a letter). I see no more reason to have a شكراً page than to have a كتّب page. Once again, can be fixed with redirects but I think that for consistency, it would be better if we kept شكرا --Beru7 17:11, 28 March 2009 (UTC)[reply]
I've thought about it more and I guess the problem with strong initial hamzas (the ones that should actually be written) is that they do not appear on all keyboards. For example, the stickers on my keyboard show أ and آ but إ is not written on the stickers I have, even though it is accesible through shift+ع. I've looked at pictures of arabic keyboards, and most of them are the same. And something else: arabic dictionaries show all words beginning with hamzas (strong أ or weak ا) under ا. Our categories will be displayed the same way if we do not include initial hamzas in page titles. Beru7 15:17, 2 April 2009 (UTC)[reply]
OK, Beru7, with tanwin fatha it's less consistent than with hamza. Also, the popular editors - Yamli and Google transliterator both consistently offer hamza and tanwin fatha but not the Google translator. I heard some Arabs cringe at missing tanwin on words like شكراً but words without it are still too common and perhaps more common than with it. As for ى written for ي, this is the common Egyptian style, isn't it? I've got a book translated into Arabic, ي is never used in the final position, making it confusing, the English Wikipedia mentions this feature of the Egyptian style of writing. The redirects could also be used, of course, if the word with the other letter doesn't exist.Anatoli 22:29, 28 March 2009 (UTC)[reply]
I would like to include something about transliteration in the guidelines. Right now, things are a bit messy, with many systems used, and even sometimes mixed. My own preference goes to the qalam system (http://en.wikipedia.org/wiki/Qalam), which I think is the best for english speakers. It is very easy to type, and gives a good idea of the pronunciation. However there are two flaws, in my opinion: first is the transliteration of the ع, the symbol used, ` is not on all keyboards (not on mine at least!), and not very easily distinguishable from the one used for hamza, '. It seems to me also that it does not provide a way of telling ث (th) from تْه (also th). Personaly I'd like to use 3 for the ع and I have nothing to propose for the "th" problem, which should be rare, though.
I'd also like some input on the conjugation templates I have created this week, please see كتب. Especially, I'm not happy about the presentation.
Lastly, I was wondering what to do with case inflections. Should they be specified for nouns ? adjectives ? all words that actually have such inflections ? And how should they be presented ? Any ideas are welcome. --Beru7 17:10, 10 April 2009 (UTC)[reply]
A transliteration guideline is definitely needed, but I’m not sure about the qalam system. The qalam system: ' aa b t th j H kh d dh r z s sh S D T Z ` gh f q k l m n h w y. I don’t have strong feelings about aa/ee/ii/oo/uu for long vowels, and using uppercase H, S, D, T, Z is not too bad, but I don’t like the digraphs th/kh/dh/sh/gh. I am very much against using ` for ع, because, as you pointed out, it is not readily distinguishable from ' on a computer screen. I have been using ’ for hamza and ʕ for ʕain, but would not be opposed to 3. The transliteration system that I have been using is: ’ ā b t θ j ħ x d ð r z s š ṣ ḍ ṭ ẓ ʕ ğ f q k l m n h w y. My main concern with the qalam system is the double letters, which will lead to confusion with words such as مستسهل (mustashal).
I think the conjugation tables are looking very good, but there is too much separation. It would be better to have a single table instead of nine separate ones. I don’t know if it would be very difficult to do a single table...html (or whatever this markup language is called) is a mystery to me.
As for case inflection, I think nunation should be marked where appropriate أختٌ, بنتٌ. It would also be nice if there were a simple and clear way to list all of the case inflections, but I don’t have any ideas about how it could be done. Maybe it will be too much. —Stephen 19:03, 10 April 2009 (UTC)[reply]
What about removing the ambiguity like this: mustas-hal or like this mustas°hal, which I think is nice because it looks like a sukun. --Beru7 20:41, 10 April 2009 (UTC)[reply]
I think mustas-hal would be less confusing. The ° could be read as a weak o. If we use sh, th, dh, etc., there is another -h combination that, though not used, would still probably be misunderstood. For instance, ازهر, azhar. If sh = š, then zh will likely be read as ž (ažar). —Stephen 23:49, 12 April 2009 (UTC)[reply]

Arabic[edit]

ROOT (then the three or sometimes four Arabic letters of the root)[edit]

Form (then the Pattern Number I-X and the Arabic فعل pattern)[edit]

Part of Speech[edit]

pronunciation

  1. Definition.

Part of Speech (a second one if appropriate)[edit]

pronunciation

  1. Definition.

References[edit]

Romanization proposal[edit]

The proposed romanization system for Arabic is based on the qalam system with a few modifications. Reasons for this choice are:

  1. Transliteration of characters which have no equivalent in latin script are, where possible, the ones that are already in use in the english press, books and atlases (for example, ش gives sh, خ gives kh etc.)
  2. The romanized result is easily typed on any latin keyboard, because it does not use any special characters. This is important because the volume of transliterations that have to be typed is high, and arabic entries already require typing in 2 different scripts at the minimum.
  3. It is easily read on a computer screen.

The modifications to the qalam system are, mainly:

  1. Diacritics are always transliterated
  2. ع gives 3 (widely used on the internet for transliterating that letter)
  3. ة gives a(t)
  4. the character - is used to remove ambiguities such as between ش (gives sh) and سْه (which would give sh also). In the latter case, the correct transliteration is s-h (the ALA-LC system uses ′)
  5. 'alif madda transliterates to 'aa
Letter Rom. IPA Notes
ا aa when in initial position,the 'alif then represents a weak initial hamza. It then transliterates to the short vowel it supports (a,i or u)
ب b b
ت t or t- t use t- when transliterating ـتْهـ to avoid confusion with ث
ث th θ
ج j ʒ
ح H ħ
خ kh x
د d or d- d see ت for d- usage
ذ dh ð
ر r r
ز z z
س s or s- s see ت for s- usage
ش sh ʃ
ص S
ض D
ط T
ظ Z ðˤ
ع 3 ʕ
غ gh ɣ
ف f f
ق q q
ك k k- k see ت for k- usage
ل l l
م m m
ن n n
ه h h
و w or uu w
ي y or ii j
ء ' ʔ
ة a(t) a at isolated words should use a(t), if not isolated, a or at should be used.
Short vowels
ـَ a a
ـُ u u
ـِ i i
ـً an
ـٌ un
ـٍ in
Long vowels and diphtongs
ى ا aa
آ 'aa ʔaː
ـَو aw aw
ـُو uu
ـَي ay ay
ـِي ii
  • Hamzas are always written ' regardless of which letter they sit on
  • Orthographic و and ا occuring at the end of certain verbs are not transliterated
  • ال always gives il- regardless of elision and sun and moon letters rules
  • To transliterate shadda, the concerned consonant is written twice.
I think it would be much better to use s-h than s°h. Another possibility is s~h (mustas~hal). Otherwise, most of it seems okay. If I understand you correctly, you want to transliterate استلم as estálama, and استسلم as estáslama. I’m okay with trying this out, but I not sure what kind of response we’ll get. It’s a radical change. —Stephen 00:14, 13 April 2009 (UTC)[reply]
It's not a radical change, it's just a bad idea ! اُكْتُب would give ektub, for example. It's not good. So I've changed that to be the same as in most other systems: weak hamzas transliterate to the vowel they carry. I've also changed the ° thing. And removed the dialectal variants, which are too many if you consider all the dialects that exist. I guess each one that has its own language code will have its own modifications to the transliteration system. Thanks for your comments --Beru7 16:13, 13 April 2009 (UTC)[reply]
I prefer the older method. I haven't created many Arabic entries but I've been adding some translations. Symbols can be copied from here: http://en.wikipedia.org/wiki/Romanization_of_Arabic. It's also consistent with Wikipedia, with the exception of "kh" for "x". If you insist on the change, I would change 3 to ` (backquote), e.g. al-`Irāq. Also, I would always romanise the definite article as "al-", unless it's a dialect and not use "e" and "o" at all, except for foreign words or dialects and show the pronunciation change for the sun letters. Hamzas could be omitted if at the beginning of a word. Anatoli 22:45, 15 April 2009 (UTC)[reply]
I too would have prefered to use the same system as in wikipedia, but it's very inconvenient given the volume of transliterated text that has to be typed here: entries, all links within entries (think of all the broken plurals), examples, translations, and if we were to scrupulously follow the wiktionary transliteration policies, even all the conjugated verbs and various inflections. Copying/pasting symbols will take forever.
The ` / ع issue has been discussed above: both Stephen and I have agreed that it is not easily distinguished from ' which is standard for hamzas. And it's not easily typed, either, at least not on my french keyboard !
I agree about al-, it would be more consistent since the vowel is a fatha.
"e" and "o" are not used in the present version.
For the reason why sun and moon letters should not be transliterated, please refer to Wiktionary:Transliteration#Key_terms, last paragraph. It will also reduce the number of possible errors.
Lastly I can think of no reason why strong initial hamzas should not be transliterated if all other diacritical marks are. What reason do you have in mind ?
Thanks for your comments. --Beru7 05:34, 16 April 2009 (UTC)[reply]
You are basically suggesting a transliteration used in Arabic chat alphabet? Issue with copying is not just for Arabic but Chinese, Russian and other languages whose transliteration requires diacritics. If you think you are going to stay an active wiktionarian and be consistent, then I may agree but you are new (I have started being active just recently as well) and we already have a lot of entries romanised with diacritics. What are your thoughts about this?
Missing transliteration for the initial hamza is common for textbooks and wikipedia, as the initial glottal stop is always there (if the vowel is not elided) like in German, it doesn't have to be written.

Anatoli 06:37, 16 April 2009 (UTC)[reply]

Why not transliterate just one letter as it is in the chat alphabet if it's convenient, works, and is understood by most ? The rest is the qalam system, which has been around for a long time.
Diacritics may be required for other languages but if, for arabic, we can do without any while using an established system (qalam), then why shouldn't we ? It's not just for me, it'll be more convenient for everyone. We absolutely need a romanization standard. Right now there's none, and the result is a mess, with several systems used and sometimes even mixed. So we get to pick one now. It might just as well be one that suits our needs, which is why I picked this one: not because it suits my taste but, I insist, because it suits our needs.
About strong initial hamzas (shown by the ء mark), they indicate that the vowel they carry is never elided. So they are important. Remember, we are a dictionary and as such, our transliteration system should show as much about the original orthography as possible. --Beru7 07:07, 16 April 2009 (UTC)[reply]
OK then. Let's do it the way you suggest. The side benefit is that it would almost enable to convert the words to Arabic in Yamli editor (http://www.yamli.com/arabic-keyboard/), etc. We are not adding word stresses then? Although, the transliteration may look as scientific, the number of Arabic entries and translations is more important to me. Even the basic Arabic vocabulary is not yet covered in Wiktionary.
I wouldn't spend too much time on all Arabic conjugations and declensions (if there examples) but provide the essentials - plural for nouns, present, maSdar and imperative forms for verbs, something that is always important for those who are interested in Arabic. Anatoli 23:03, 16 April 2009 (UTC)[reply]
Has this discussion stalled? The impact is that we haven't agreed on the standard of romanising Arabic. If the problem is 3 vs ` and -a(t) vs -a, I am sure we could come to some compromise. Anatoli 05:44, 12 May 2009 (UTC)[reply]

Typing tool[edit]

Here's a little toy I made this afternoon: http://www.enselme.com/beru/trans2arab.htm Type in your transliterated text and it should give it back to you in arabic script, fully vowellized. It uses exactly the same system as we do, but I had to make a few additions, add -an at the end of a word to get tanwin, like shukraa-an, and aa~ to get alif maqsura, like mashaa~ There are probably bugs left so if you find any, let me know on my talk page ! I hope this will help us get more vowellized entries ! I don't know if this could be integrated in the wiki or not, but the code is free to use, reuse and modify... --Beru7 23:28, 18 April 2009 (UTC)[reply]

Looks interesting. How do you enter fatha tanwiin + alif? مرحباً Anatoli 00:13, 19 April 2009 (UTC)[reply]
like this marHabaa-an --Beru7 11:49, 19 April 2009 (UTC)[reply]
Thanks, next questions: لأن and جامعة, combination لإ how do you enter these? It looks promising if you get all the variations and some short tutorial would be great. Anatoli 13:18, 19 April 2009 (UTC)[reply]
For جامعة, it would be jaami3a(t) and it works. For لأ it should probably be l'a but it's not working. Thanks for finding this one ! And I'll try to make a prettier page later. --Beru7 13:33, 19 April 2009 (UTC)[reply]
Replying here, where the link is and so that other users can see. Thanks for the fix, Beru7. It looks better. Perhaps worth trying to enter some longer text for testing. Anatoli 22:56, 22 April 2009 (UTC)[reply]

Arabic languages[edit]

In some respects we consider Egyptian Arabic, et al., as dialects of Arabic:

Yet in other respects we consider them separate languages:

What's going on?—msh210 22:36, 22 April 2009 (UTC)[reply]

What's going on is that Arabic on en.wikt is a mess right now.
I wanted to raise the same issues so thanks for doing it ! I guess the problem is that there is enough difference between the dialects to have different language codes, but not enough differences to really have complete dictionaries for each, although dialect dictionaries do exist, but there is a lot of overlap with MSA dictionaries. So what should we do ? Several possibilities:
  1. Remove all dialectal words from Arabic sections. Cut and paste all words in common between msa and the various dialects. A lot of work, and a lot of redundancies.
  2. Remove all dialectal words from Arabic sections. Put them in the appropriate dialect sections. For words that are common to both a dialect and MSA: insert a link to the MSA section. This is what I have been trying with Egyptian Arabic lately. But autoformat doesn't seem to like it.
  3. Put everything in Arabic, marking dialectal words. This might get very confusing.
Personally I think 2 is the best option if it can be made to follow the general policies of the wiki. 1 would be my second choice, and 3 is probably not a good idea. Of course there might be other solutions.
I'd very much like to know how it is done in other languages which face similar situations. Beru7 23:21, 22 April 2009 (UTC)[reply]
Also, finding out what's dialectal and what's standard is hard. "Common usage" is not much of a criterion for Arabic, as common usage is colloquial, often considered slangy if you use it in writing. Dialectal words are often written as normal but pronounced differently (dialects may miss some sounds from classical Arabic) but some prefer to highlight the difference. So, it's a mix. I agree that perhaps marking words as dialects may be sufficient. The translations should focus on MSA. The border between MSA and dialects is not always clear. For example, some people consider loanwords as slang, do not want to include as standard Arabic but they are too common. The mess may continue as editors may be from different Arab countries where attitudes differ as to what is right and what is wrong, there could be some spelling/transliteration variations as well, especially for less formal words. Anatoli 00:21, 23 April 2009 (UTC)[reply]
I think common usage in MSA is what you will find in books and newpapers. That is the corpus that has been used to build modern arabic dictionaries. That language has a name in arabic, فُصْحى (fusHa). So that would go in "Arabic" which should really be "Modern Standard Arabic". I cannot speak for other dialects, but for egyptians, the difference between فُصْحى and عامِّية ‎(3aammiyya(t)), dialect, is quite clear. There is a lot of overlap with MSA in terms of both grammar and vocabulary, but there are also significant differences, and not only of pronunciation. Sometimes the singular form of a noun will be the same but a different plural will be used. Conjugation is not the same. In Egyptian, dual is barely used. Phonology is very different.
So native speakers are very aware of these differences and can switch from one to the other (diglossia), at least if they have been to school.
Upon reflection, I think we should really consider arabic dialects as separate languages, and remove references to dialects in the Arabic sections. Beru7 12:53, 23 April 2009 (UTC)[reply]
Agree with the suggestion. My point about the mix is the so-called Formal Spoken Arabic, which has elements of both. The mix can be different, I beg to differ, especially in Egypt, where the dialect has much higher status and usage. This discussion is not about this, anyway. Otherwise, we start discussing, whether we spell أطيل or أتيل or we should avoid it and use فندق only. :) You may say, they are part of standard Arabic now but a purist from Saudi Arabia may not agree. Just an example, perhaps not a perfect one, forgive me, my Arabic is not very good but I read a lot about it. Can we say, if Hans Wehr dictionary or another solid dictionary uses it and a word is used in newspapers, then we can include it? Anatoli 13:09, 23 April 2009 (UTC)[reply]

Initial Hamzas[edit]

A few months ago, when we were discussing the guidelines for Arabic we came to the decision of not including strong initial hamzas in page titles. I have come to think that this decision is the wrong one. I have been reading a lot of Arabic texts in the past few months, and most show all, or nearly all, of those hamzas. Here are a few examples:

I think most other wiktionaries do not have the same policies, either, and that sometimes causes problems with interwiki links. Beru7 19:34, 22 June 2009 (UTC)[reply]

Words written with and without the intial hamza are both common but the former method is recommended in many Arabic textbooks (it doesn't apply to elidable hamza). I prefer to write it in translations from English but as we discussed before, the Arabic entries may exist without hamzas but there must be a redirect with it. Anatoli 20:26, 22 June 2009 (UTC)[reply]

Publishing the guidelines and cleaning up Wiktionary:About Arabic[edit]

I think it would be time to publish the guidelines above on Wiktionary:About Arabic, and generally clean that page up. I will do that and also change the hamza guidelines in a few days if nobody has any objection. Beru7 17:05, 23 June 2009 (UTC)[reply]

There were serious objections about these two, if I haven't missed something:
  1. ع gives 3 (widely used on the internet for transliterating that letter)

Me too, I prefer "`" as in qalam.

  1. ة gives a(t)

I prefer just "a", except for cases where it is pronounced - "at", ignoring the case endings. Anatoli 01:58, 24 June 2009 (UTC)[reply]

To me (who can't read Arabic, does not contribute to it but looks up entries quite often) the words transliterated with this new scheme are very confusing, e.g. مطعم. I liked Stephen's scientefic scheme (which is close to w:DIN 31635 or w:ISO 233) seen in السلطة الوطنية الفلسطينية much better. I understand it's hard to type those characters but, hey, we don't avoid doing right things just 'cause they are difficult. After all, this is a solvable problem: you can download Microsoft's excellent tool and map your own keyboards with any desired characters. Alternatively, when Conrad implements his automatic transliteration tool into WT:EDIT, Arabic words could be automatically semi-transliterated with vowels being added manually. This said, I understand I can't make you all chose the harder path and I promise not to raise the devil in case you pass the proposal officially :) --Vahagn Petrosyan 12:38, 23 July 2009 (UTC)[reply]
Hi Vahagn ! Actually this is not the scheme Stephen was using. Before the new system, Stephen was using IPA characters to transliterate arabic. I guess the really big problem with our system is the usage of "3". It confuses many people. On the other hand it has become the standard for informal transliteration (called arabizi, عربيزي)and is used more and more (Microsoft Maren, Google Ta3eeb, Yamli etc...). The main obstacle in changing this has been the opponents of the "3" because they want to hear about nothing but "`", which is hardly distinguishable from "'", which is used to transliterate a completely different letter. But "ɛ" could be used, or "ʕ". I thought about "c" but since there is no unicode for it it is not going to play well with search nor Conrad's tools. By the way - I find the DIN system horribly confusing with t ṯ ṭ, h ḥ ḫ etc... and the complete specs are not free, they have to be bought from the DIN. On a final note, an interesting propery of the current system is that it is machine-convertible to other systems: DIN, for example etc. This might prove useful in the future. Beru7 15:03, 23 July 2009 (UTC)[reply]
I think "ʕ" is best for ayin. Other confusing things to me are aa instead of ā and other long vowels. --Vahagn Petrosyan 09:20, 24 July 2009 (UTC)[reply]
I have noticed that the symbol "3" does not work with the template {{ar-verb}} if the "3" is the first radical. It is treated as a numeral and gets moved to the left of the Arabic verb. —Stephen 15:47, 26 July 2009 (UTC)[reply]
I have added a call to {{LR}} before the transliteration and it works now. Beru7 17:23, 26 July 2009 (UTC)[reply]
Thanks. Now I see that the same problem exists with {{infl}} used for nouns and adjectives, as in عزل. —Stephen 20:09, 26 July 2009 (UTC)[reply]
I don't have the rights to modify that template but this fixes the problem: {{infl|ar|noun|sc=Arab|tr={{LR}}3azl|g=m}} —Beru7 20:19, 26 July 2009 (UTC)[reply]

Hamzas in the lemmas[edit]

Hi! As everyone knows, short vowels are actually never indicated in everyday text you read in Arabic newspapers or see on TV programs, probably the Qur'an is the only occasion where they are marked. (Or am I wrong?) Anyways, I started recently Arabic (so, I know I probably should ask this somewhere else than here, sorryyy...), so I have a question: do they always vocalize the initial hamza alifs that have short vowels? In other words, do they write for example the word أخت "sister" more often as اخت? I'm asking this as an example to determine what's our policy or guideline with the lemmas. We have apparently articles that follow different rules:

...or then vice versa...

No, initial hamzas usually are not written. أخت is rare, اخت is usual. Years ago, we began by leaving the initial hamzas off, since that is how the words are usually written, but over time some editors have felt that, for a dictionary, it is important to put the hamzas where they belong. So some older entries may not have them, while newer entries probably do have them.
I don’t think we have reached a consensus on how to handle the various common ways to write a word...if we should use redirects, or make entries that explain that it’s an alternate form. Therefore, most of these alternate forms have so far just been ignored.
Of course, there are some words that are spelled the same way in Persian, except that Persian does not use the initial hamza. Such words cannot be redirected to the hamza spelling, so an "alternate form" entry would be required.
In general, then, if a hamza belongs on a word, even though it is commonly left off in normal texts, we are putting the hamza where it belongs. It’s one of several very tricky issues with Arabic, most of which have still not been seriously addressed. —Stephen (Talk) 04:03, 4 July 2012 (UTC)[reply]
In a modern running Arabic online text the hamza is usually present or very common, unlike optional diacritics. Also, Google translate and automatic conversion tools like Yamli usually support hamza as well. Just try typing "anta" in Yamli and see how often أنت is written on the web with a hamza. In the books I've got at home hamza is quite common but I also have books that don't use them often. No, we haven't reached consensus, so it's better to discuss how we are going to treat this issue before changing anything. Egyptian Arabic is more relaxed and uses hamza less often, same with ى in the final position, which is use when ي (y) is required or ه (h) when ة is required. All searches should eliminate Persian and other Arabic based languages. As I mentioned, Egyptian Arabic is written in a less strict form. --Anatoli (обсудить) 04:21, 4 July 2012 (UTC)[reply]
Perhaps we could treat Arabic hamza similar to the Russian ё issue? Have redirects and alternative forms for words without hamza? At least, writing hamza is considered correct, even in a non-vocalised text. Many words without hamza may also be the spelling forms for Egyptian Arabic, Persian, Urdu, if other letters coincide. Note, words with elidable hamza are not spelled with hamza, even if in cases where it has a phonetic value, eg. ابن. Please check Arabic Wikipedia for letters أ‎ (ʔ)‎, إ‎ (ʔi)‎, ﺁ‎‎, ؤ‎‎ (ʔ)‎ and ئ‎ (ʔ)‎. They are quite common. --Anatoli (обсудить) 04:31, 4 July 2012 (UTC)[reply]

That discussion should have been discussed at the respective talk pages of the mentioned words, however, the case for the month April, it is pronounced in two ways by Egyptians (loanword vocabulary isn't standardized in Literary Arabic), either with an initial open vowel or with an initial high vowel, but because they are the same word, then there is no need to have two separate pages. --Mahmudmasri (talk) 21:59, 17 November 2013 (UTC)[reply]

Please see my changes to أبريل and ابريل (see how I transliterated both as well). I made the former the main entry and the latter - an alternative form. Is this OK with you? Even if loanwords are not standardised, these are two possible spellings. --Anatoli (обсудить/вклад) 00:39, 18 November 2013 (UTC)[reply]
Yes, it is OK to have it mainly as any of the three: أبريل,‎ ابريل or إبريل. Yes, with an under hamza is also considered correct, but for the high vowel pronunciation for the first syllable [ɪ]. The only problem is that you wrote the Latin transliteration for ابريل without an initial half-ring (ʾ), but with it for أبريل, however they are both pronounced with an initial glottal stop. For practicality and as I was expecting, I didn't want the half-ring (for the glottal stop) be written initially, since it would be harder to use such strict transliteration. Do you still think it is a practical to transliterate the initial glottal stop? --Mahmudmasri (talk) 01:45, 18 November 2013 (UTC)[reply]
My idea is (I think it's practical) to reflect the written hamza with ʾ (e.g. ʾabrīl) symbol and show nothing (e.g. abrīl) when it's not written (yes, ignoring how it's pronounced - initial "i" in "إذا" and "ابن" are pronounced identically). Marking hamzas is especially important to show when alif is not elidable. Perhaps (alternatively), all words starting with an alif, except for alif waṣl should be marked with ʾ.
Feel free to create إبريل based on ابريل with transliteration "ʾibrīl". What do you mean "harder to use"? BTW, if you use Firefox, I can give you a hint how to add any symbol quickly in a single tab (but not Arabic diacritics). --Anatoli (обсудить/вклад) 01:58, 18 November 2013 (UTC)[reply]
I think what Mahmud is saying is that in ابريل, the initial hamza is not elidable even though it is not written. --WikiTiki89 02:16, 18 November 2013 (UTC)[reply]
Yes, I understand this (that's why I said "especially") but what do you suggest? We could add ʾ to all initial alifs, excluding words staring with همزة وصل or only to those where hamza is marked, even if it creates a discrepancy with transliteration of alternative forms. --Anatoli (обсудить/вклад) 02:21, 18 November 2013 (UTC)[reply]
Please see أفريقيا with alternative forms. Transliterating all alternative forms is time-consuming but this is an example how I see the use of ʾ when transliterating alifs with or without hamza. --Anatoli (обсудить/вклад) 02:45, 18 November 2013 (UTC)[reply]

Arabic vocalization[edit]

For the record, you cannot add compound Arabic vocalizations on Wikipedia or Wiktionary in the usual way. When you save the page, the order of the two diacritics is automatically reversed, which is the incorrect order. If you type, for example, "shadda" + "fatha", then save, the order is changed to "fatha" + "shadda", which is wrong. Some Arabic fonts are able to display this as if it were entered correctly, but most Arabic fonts cannot.

This effect is known as the Hebrew/Arabic vowel/niqqud bug, an unwanted consequence of "normalization": http://bugzilla.wikimedia.org/show_bug.cgi?id=2399

There is a way that it can be done. You have to enter the XML character references in place of the actual vowels: Shadda+fatha = َّ • shadda+kasra = ِّ • shadda+dhamma = ُّ.

There is a template that will do this for you: {{ar-dia}}. Just indicate sha/shi/shu (= shadda+fatha/shadda+kasra/shadda+dhamma). For example, type كتب and then place {{ar-dia|sha}} after the ت to get: كتَّب. —Stephen (Talk) 22:01, 20 October 2012 (UTC)[reply]

Proposal to remove stress marks in transliterations[edit]

Arabic dialects differ on where to put the stress in a word. Neither Modern Standard Arabic nor Classical Arabic ever indicated stress even in vocalized texts and therefore have no standard stress. Therefore, speakers of MSA just borrow the stress placement rules from their respective native dialects. Also, misplacing stress is not a big problem in Arabic, as it is in some languages (such as Russian).

Therefore, I propose to remove stress marks in transliterations of Arabic.

Does anyone disagree? --WikiTiki89 01:45, 4 November 2013 (UTC)[reply]

Stress marks are occasional. I don't think they cause problem or they are very important either. They do help people who are not familiar with Arabic stress rules, especially when last syllable is long but not stressed, e.g. كبرى (kúbrā). In any case, we don't have current volunteers to actively work with Arabic, so whatever decision, it will not make difference. --Anatoli (обсудить/вклад) 02:03, 4 November 2013 (UTC)[reply]
Once we adopt a policy, then we can start slowly making changes. The problem we have now, is that I am afraid to make mass changes if there is no consensus, and consensus is indicated on the WT:AAR page. That is why I am bringing it up. --WikiTiki89 02:46, 4 November 2013 (UTC)[reply]
Do you see a problem with having them optional and reword WT:AAR (it says "stress on short vowels can be rendered")? No-one is forced to provide stress marks. --Anatoli (обсудить/вклад) 03:04, 4 November 2013 (UTC)[reply]
My proposal here is to remove all stress marks. The main reason, as stated above, being that speakers of different dialects put stress in different places in the same word (even when speaking MSA). Therefore they can be misleading. --WikiTiki89 03:07, 4 November 2013 (UTC)[reply]
I'm not aware of different stress patterns. I know that stress rules are easy and predictable. Are you referring to situations where words are stressed differently with full case ending and in pausa (case endings dropped), e.g. مكتبة - "maktábatun" vs "máktaba"? --Anatoli (обсудить/вклад) 03:18, 4 November 2013 (UTC)[reply]
Never mind. I found some confirmation (even though it's not very trustworthy) for your claim about dialects - I hardly studied any dialect, so I don't know stress patterns in dialects. No objections from me. --Anatoli (обсудить/вклад) 03:25, 4 November 2013 (UTC)[reply]
I was going to say that some speakers would say máktaba while others would say maktába regardless of whether they include the -tun. --WikiTiki89 03:28, 4 November 2013 (UTC)[reply]
I don’t want to make any strong objections, since I have stopped editing Arabic after editors who did not know what they were doing began doing severe damage to some Arabic pages. I will just remark that when students start learning Arabic, they want to know where to put the stress. Yes, stress can vary under different circumstances, but nevertheless we can advise people on a good standard placement of stress. Even though many transliterations do not have any stress marked, we should not remove correct stresses that exist or eventually will exist.
Again, as I don’t do the Arabic here anymore and don’t look at the Arabic pages, I don’t want to say much about the transliteration system. But a few years ago, some editor (I think it might have been User:Angr, but I’m not certain) insisted on having a strict and very difficult set of transliteration systems for Burmese, and the result was that nobody but him could enter Burmese words anymore. Even native Burmese could not add those transliterations well, and Burmese has been on the slow track ever since. The system you are proposing for Arabic is, in my opinion, very difficult and unnatural, especially on a computer screen. I can read it myself only with great difficulty, and I would never purchase a book that used this system. Arabic is virtually dead here because of bad decisions and actions by some editors, almost none of whom contribute to our Arabic effort, and I think this just puts the last nail in the coffin.
It’s only my opinion, I don’t edit Arabic here anymore, please do what you want with it. —Stephen (Talk) 04:59, 4 November 2013 (UTC)[reply]
Yes. As Stephen writes with no apparent sense of irony whatsoever, it is bad for the readers and editors, really for Wiktionary in general, when someone imposes a difficult, non-standard transliteration system. Michael Z. 2013-11-05 18:01 z
You both say that this scheme is "difficult", but I fail to see what makes you think so. Perhaps one of you could explain it. As I said below, the scheme is still up for discussion and I can't improve it if I don't know what's wrong with it. The reason I didn't use a standard scheme is because I think this one is easier to read, not to mention that it is currently the one we seem to be already using in a plurality of articles. Anyway, I don't care that much myself about what our standard scheme will be, as long as we have one single standard, so that we don't have every entry using a different scheme. --WikiTiki89 18:27, 5 November 2013 (UTC)[reply]
I’ll reply in the next section below, since that is about the transliteration. Michael Z. 2013-11-05 21:37 z
Is there anything in particular you don't like about the transliteration? We need to standardize it because it's worse to have a million different transliterations systems than to have a single bad one. What we standardize it to is still up for discussion. Once the transliterations is standardized, we can even use gadgets to display a reader's favorite system.
Regarding stress marks, the point I'm making here is there is no "good standard placement of stress". And we still have an entire pronunciation to deal with things like stress placement that allow us to specify who places the stress where and not just where we think the stress should be.
I'm not trying to scare off Arabic editors, I'm trying to help them by cleaning up our pages, which can hardly be done if we have no policy. --WikiTiki89 13:46, 4 November 2013 (UTC)[reply]
  • I don't know of a standardized Arabic transliteration which indicates stress.
  • I saw that there was a common practice in Wiktionary to indicate stress by using the acute accent on vowels.
  • (Not to be confused with Arabic dialects which have different grammar and more distinct pronunciations)
    • I only added stress when most accents of Literary Arabic were pronounced the same, but avoided them or removed them when the stress wasn't the same.
    • Moroccan and Algerian accents of Literary Arabic don't have stress at all.
  • The standardized Arabic transliterations don't necessarily indicate the exact pronunciation, they are halfway between indicating Arabic spelling and the loose pronunciation. Additionally, final alef and alef maqsura are transliterated with ā (a-macron) in most transliterations.

If you want to remove stress, then for the previous reasons, if not, then we should only add it when the word is stressed the same by most Arabic speakers. But I feel that for practicality, we would stop indicating stress in transliteration.

For Anatoli:
مكتبة is pronounced maktába [mækˈtæbæ] or maktábatun [mækˈtæbæton] (with nunation) by Egyptians, while by the Levantine people as máktaba [ˈmaktaba] or máktabatun [ˈmaktabatʊn]. --Mahmudmasri (talk) 21:51, 17 November 2013 (UTC)[reply]

Standardizing transliteration characters[edit]

Currently WT:AAR provides many alternative transliterations for each letter. I propose we adopt the following scheme exclusively (there is nothing new here, just no more alternatives):

ء ʾ ب b ت t ث ṯ ج j ح ḥ خ x
د d ذ ḏ ر r ز z س s ش š ص ṣ
ض ḍ ط ṭ ظ ẓ ع ʿ غ ġ ف f ق q
ك k ل l م m ن n ه h و w ي y

Does anyone disagree? --WikiTiki89 01:45, 4 November 2013 (UTC)[reply]

They sit there because there was not enough effort to convert all instances of old transliteration to the new method. Your table is my preferred as well but there are too many old transliteration examples still. So, people will wonder what all those capital letters, double vowels, number "3", etc. mean. My preference would be to clean up all entries and translations first, then update but I would leave alternatives somewhere.
With semivowels و (w) and ي (y), it's not correct to simply transliterate them as "w" and "y", as they can be vowels "ū" and "ī". --Anatoli (обсудить/вклад) 02:03, 4 November 2013 (UTC)[reply]
You're thinking backwards. We cannot enforce a policy until it is on the policy page, otherwise no one would know about it. If you want the alternatives to remain for the sake of reference, then we can leave them in the table but mark them as deprecated. Regarding the semivowels, yes, I was only talking about the consonantal use of و and ي. --WikiTiki89 02:50, 4 November 2013 (UTC)[reply]
No objections, if deprecated letters are provided. This table would be incomplete without mentioning ة, ى (in the final positions) and vowels. --Anatoli (обсудить/вклад) 03:01, 4 November 2013 (UTC)[reply]
Isn’t the proposed transliteration table just a letter or two away from several standardized systems? Any reason why we can’t choose one of the established systems described in w:Romanization of Arabic? What possible advantage justifies creating an incompatible proprietary system? Michael Z. 2013-11-05 17:48 z

WikiTiki89, I don’t think this system is too difficult (by my remark in the section above I was referring to the irony that Stephen complains about non-standard romanization, when he is one of the defenders of a completely non-workable system for Russian romanization).

My concerns are the disadvantages of a novel Arabic romanization system for Wiktionary:

  1. If we amateurs create it, then would comprise original research, likely lacking any firm functional or academic basis. It would be subject to the whims of individual editors.
  2. If we created it, then nothing prevents helpful editors from constantly “improving” it, thereby ensuring that most romanizations entered in the dictionary will remain out-of-date.
  3. We would waste effort developing such a system, debating the relative merits of individual features, when we should be working on the dictionary instead.
  4. If it is novel, then new readers and editors could never benefit from already being familiar with it.
  5. If it is non-standard and therefore incompatible, then readers cannot benefit from our romanizations in any other work. For example, they wouldn’t be able to use them in a professional publication or other work, which would likely require use of standard romanization methods.

Standardized systems have many advantages, as recommended by professional bodies. Some are listed at WT:TRANSLIT#Criteria for romanization systemsMichael Z. 2013-11-05 21:51 z

Also, readers and editors could benefit from familiarity with a system if it were compatible with those for other Arabic-script languages. I don’t know how feasible this is. Michael Z. 2013-11-05 22:03 z
It's funny you mention professional publications, because every linguistic scholarly work I've read on Arabic so far has used it's own invented scheme and was not internally consistent (and I mean that hyperbolically). But otherwise you make some good points. Here's my rebuttal:
  1. Wiktionary allows original research. Also, this is not exactly research.
  2. As long as the "improvements" are indeed improvements, they are welcome.
  3. The system is already there. We don't need to spend any time or money on R&D.
  4. It is not so novel. We already use this system on the plurality of our pages and it is similar enough to many existing systems that it doesn't require any learning.
  5. Who's gonna use Wiktionary as a source in a professional publication? And if they do, there is nothing preventing them from substituting characters for more "standard" ones.
--WikiTiki89 23:37, 5 November 2013 (UTC)[reply]
  1. Wiktionary has a community standard threshold for verifying a term, usage, etc. Do any guidelines actually allow or encourage original research beyond that? But as you say, our romanization methods are based on personal preferences, not any research.
  2. I don’t know of any goals or criteria for success ever used in developing our original romanizations, so no one can actually claim that any changes were or were not improvements. They were just changes. If one considers stability to be a beneficial feature, then any unjustified change is harmful, because arbitrary changes make Wiktionary’s contents unpredictable, and therefore apparently not reliable.
  3. I’m not sure what you mean by the system is already there. The “system” on this page has already been there, ever-changing, since April 2009. You are changing it now, and editors would continue changing it ad infinitum. A system like ISO 233 or ALA–LC is “already there” in the sense that you mean, but wiki systems never are.
  4. If it is not the same as a standard system, then it requires not only learning, but possibly also unlearning old habits, and constant alertness to avoid mistakes in the minor differences. It also invites mistakes because an editor may not realize it is different from romanization everywhere else.
  5. If the point of Wiktionary is not to be a reference, then what is it?
By the way, w:Wikipedia:Manual of Style/Arabic uses ALA–LC romanization for Arabic[1] and Urdu.[2] It looks like that only has 5 significant differences from your proposal (ṯ → th, x → kh, ḏ → dh, š → sh, ġ → gh). Is there a good reason not to adopt ALA–LC here, to gain the benefit of compatibility with Wikipedia and world-wide English-language publications and library catalogues? Michael Z. 2013-11-06 04:37 z
It's w:Romanization_of_Arabic#Comparison_table -> w:Hans_Wehr_transliteration as in the dictionary everybody uses, except for two letters "ḵ" (everybody just preferred "x") for "خ" and "ḡ", which was "ġ" in the older versions for "غ". "ǧ" instead of "j" for "ج" was disliked by everybody and Hans Wehr changed it to "j" in the latest editions, which now matches our standard as well. We can update and use "ḡ" and "ḵ", which will match Hans Wehr totally. I don't see Wiktionary to catch up with Hans Wehr in the near future, so it would make much more sense to follow the standard of the most used dictionary. --Anatoli (обсудить/вклад) 04:55, 6 November 2013 (UTC)[reply]
Whatever we do, the result should go into Template:ar-root/tr as well, which automatically transliterates Arabic root consonants. BTW, w:Hans Wehr transliteration uses "ʼ" and "ʻ", which are visually harder to distinguish than ʾ ("ء") and ʿ ("ع").
Suggested transliteration of consonants (excluding dialect and loanword exceptions):
ء ʾ ب b ت t ث ṯ ج j ح ḥ خ ḵ
د d ذ ḏ ر r ز z س s ش š ص ṣ
ض ḍ ط ṭ ظ ẓ ع ʿ غ ḡ ف f ق q
ك k ل l م m ن n ه h و w ي y

--Anatoli (обсудить/вклад) 05:15, 6 November 2013 (UTC)[reply]

Well, why didn’t someone say it was based on a dictionary? I would encourage just picking an edition and Wehr and following it exactly. But anyway, this is a sensible approach, and I suggest citing Wehr and noting any differences from the published system. Sorry for the involved discussion.
On my machine, the apostrophes ʼ and ʻ are a bit clearer than the half rings ʾ and ʿ owing to their thicker round dot, at my preferred 16px font-size. At Wiktionary’s default 13px font-size, the half rings look rather like identical vertical ticks. Michael Z. 2013-11-06 05:28 z
Half rings ʾ and ʿ are used by DIN, ISO and SAS standards. The printed book look examples look more like semicircles (may double check, I've got the dictionary at home) than apostrophes. I think Wikipedia page may not use the right symbols. I can distinguish half rings better than apostrophes on my PC and iPad and they are closer in shape to IPA symbols: ʿ - ʕ, ʾ - ʔ. The similarity of the symbols made some user prefer IPA or number 3 (used in chat) for ʕ ("ع") in the past. --Anatoli (обсудить/вклад) 05:57, 6 November 2013 (UTC)[reply]
I am ok with those changes, since I never really liked using "x" and the shape of the diacritic on the "ḡ" doesn't make much difference to me. I will add these changes to the bottom editing toolbar for easier access to these characters. --WikiTiki89 14:45, 6 November 2013 (UTC)[reply]
I've checked my H. Wehr dictionary. "ع" is definitely romanised like ʿ, with "ء", I'm not so sure, it may be an apostrophe but it has quite a round tail, which makes it look more like ʾ (with a round ball on the top) rather than ʼ. I think this info could be requested and Wikipedia pages w:Romanization_of_Arabic#Comparison_table and w:Hans_Wehr_transliteration could be changed to use ʿ and ʾ. --Anatoli (обсудить/вклад) 00:27, 7 November 2013 (UTC)[reply]
The appearance of an apostrophe depends on the typeface design. See Google image search for a range. The half-ring varies somewhat, too, having clipped, ball, or pointed terminals. Michael Z. 2013-11-07 01:26 z
The Wikipedia page on the w:Hans_Wehr_transliteration uses ʼ for ء and ʻ for ع, which are neither apostrophes nor half-rings. Based on the amazon previews of the Hans Wehr dictionary, I am fairly certain that it uses the half-rings, but they are slanted since the transliterations are italicized: ʾarbaʿa. --WikiTiki89 01:54, 7 November 2013 (UTC)[reply]
Well U+02BC, the “modifier letter apostrophe” ( ʼ ), certainly is an apostrophe. Its difference is the technical coding so that it works like a letter and not like punctuation. For example, on my system, double-clicking a word selects the entire word, leaving out punctuation marks but including “modifier letter” characters:
punctuation: ‘aa ’aa “aa ”aa
modifier letters: ʼaa ʻaa ʽaa ʾaa ʿaa
The marks in Wehr’s transliterations (via Amazon) appear to have the stroke modulation and upper terminals of the apostrophe ( ʼ ) and “reversed comma” ( ʽ ), as opposed to the English quotation mark ( ʻ ). The very definition of a half-ring is that you wouldn’t see this bias referring to manuscript forms. There are entries where you can compare the italic transliterations to roman entry text, but the italics font has different letterforms from the roman anyway.
But that is kind of academic. As a matter of convention, Wiktionary can choose to use either apostrophe or ring typographical forms to represent these characters. Apostrophes are more accessible for text-entry. Readability seems to be mixed – the fact that both suffer on different machines (as we mentioned above) is another argument to increase Wiktionary’s font-size to the standard HTML default. Michael Z. 2013-11-07 17:16 z
Could we have a vote on standardizing the transliteration scheme instead of just going ahead and making changes? --Dijan (talk) 15:07, 8 November 2013 (UTC)[reply]
@Dijan Do you have specific concerns? The page has been neglected for a long time and it's not clear what the vote would achieve - voting for specific standard or specific letters/symbols. Transliteration of Arabic is not a simple topic and would require a lot of explaining. Can we resolve all differences here? --Anatoli (обсудить/вклад) 09:09, 9 November 2013 (UTC)[reply]
Well, voting on specific letters would be voting against standardization. I’d welcome any input from Dijan too.
But I would also like to see a vote called, because we shouldn’t be changing transliterations ad hoc. Currently there is no clear statement of the transliteration system’s goals, principles, or even the content of the proposal on the table. Questions about specific symbols remain inconclusive. There is little consideration for compatibility with systems for other Arabic-script languages, so perhaps the whole topic would benefit from the wider community’s perspective. Michael Z. 2013-11-10 15:11 z
My concern is that there is already something of a vote happening on this page, but it's being decided on by a handful of people. This page is for discussion of differences and opinions and I'm open to it. Yes, there has been neglect on part of contributors to the Arabic language and our policies regarding Arabic. As Stephen mentioned, it is precisely the rapid changes of the few that drove away the talented contributors. --Dijan (talk) 17:02, 11 November 2013 (UTC)[reply]
I'm not sure if standardisation always puts off contributors. A sole native speaker Mahmudmasri (talkcontribs) (not very productive, though) prefers standard transliteration and ZxxZxxZ (talkcontribs) (Persian native speaker but knows Arabic) also seems to prefer standards. In any case, if someone uses non-standard transliteration, it's not such a big deal. The entries can be corrected and users can be referred to this page. You still haven't said, which particular change is a concern, Dijan. --Anatoli (обсудить/вклад) 01:52, 14 November 2013 (UTC)[reply]
Which standard or standards are you referring to? Michael Z. 2013-11-14 15:35 z
Hans Wehr, why? In some cases, you can't tell for sure, which one, as symbols coincide. I any case, it's obvious when editors use "ẓuhr" (ظهر), "ʿayn" (عين), "iṯnān" (اثنان), "ġurfa" (or more up-to-date "ḡurfa") (غرفة) rather than standard than "ZHuhr", "3ayn", "ithnaan" and "ghurfa" when transliterating, they try to use standard transliteration. --Anatoli (обсудить/вклад) 22:44, 14 November 2013 (UTC)[reply]
  • Why Hans Wehr isn't the best scheme? Because it does not transliterate final ة when it is silent, however that is the common practice I noticed here in Wiktionary. It would only make conflicts if it were used along with the case ending -a, (which I don't prefer for simplification) because final silent ة comes after -a- (-ah). Hans Wehr also does not capitalize proper names nor does it capitalize words at the beginning of sentences. Therefore I feel that Hans Wehr might be easily misused along with capitalized letters or -a case ending.
  • Why DIN 31635 may not be the preferred scheme for some? Because it uses the symbol ǧ (g-caron) for ج rather than a letter with no diacritics. However, the letter ج is normally and standardly pronounced as [ɡ] in Egypt. DIN 31635 has an annoying feature of always transliterating initial glottal stops, which do not add up for etymology and the real life use of those rules aren't that strict by Arabic speakers.
  • Why ALA-LC may not be the preferred scheme for some? Because it uses digraphs, conflicting with instances of a consonant+h and it used for ى (alif maqsura) which erroneously indicates that the final syllable is stressed. However, it makes use of commonly used digraphs by Arabic speakers when writing their names in English (gh, kh, sh, th).

Conclusion: Hans Wehr or DIN 31635 are the most appropriate standards for having no digraphs. No matter what was chosen, I'm totally against using the non-standardized messy numbers (2, 3) and using capitalized letters for emphatic consonants. --Mahmudmasri (talk) 06:32, 16 November 2013 (UTC)[reply]

Re: "DIN 31635 has an annoying feature of always transliterating initial glottal stops...". It should be noted that Hans Wehr dictionary doesn't hamzate initial alifs, even if alif is not elidable (ألف (ʾalif or ʾalf) vs اسم (ism)). It's more precise and more common to write initial (including after the definite article الـ (al-)) alifs with hamza: أ (ʔ) and إ (ʔ) vs alif without hamza ا when initial alif is elidable or in the middle or final positions. The transliteration uses ʾ when hamza is written and nothing when it's not written, regardless of the pronunciation. There's no full agreement here whether we should use entries with hamza and alternative forms without. My preference is to use hamzas, as it is more educational for foreigners, common and stricter and have alternative forms entries without (this will help with etymologies, as Persian and other languages don't use initial hamza). I don't quite agree about "real life" - we're not talking about Egypt but majority of Arabic speaking countries. This feature of dropping hamza is similar to writing ى ("ʾalif maqṣūra") in the final position instead of ي (y). Mixing them up causes grief and problems for foreign learners. They are conceptually different and have different pronunciation, users shouldn't rely on transliteration alone. Entries and user examples here should use ى when it is pronounced "ā" and ي (y) in other cases. Again, "real life" is mainly applicable to Egypt, not majority of Arab countries. --Anatoli (обсудить/вклад) 02:27, 16 November 2013 (UTC)[reply]
From my limited experience with Arabic speakers online, I have noticed that the ones who write alifs without hamzah, when they should have a hamzah, also tend to write haa instead of taa marbuta. I understand that there are many historical reasons for this to be acceptable, but we are a dictionary and should use as many diacritics as it takes to make clear the morphology of a word. I agree that if the goal of our transliterations was to use them as borrowings in English then initial glottal stops would not need to be written and we would use all the capitalization rules of English, but that is not our goal. As a dictionary, we need to show the morphology of a word so that learners would be able to look up these words and know how to transform them, add affixes to them, or whatever else. Therefore the initial hamzahs are essential, while capitalization is not. --WikiTiki89 03:22, 16 November 2013 (UTC)[reply]
For Anatoli:
  1. I didn't specify the use of (Literary) Arabic in real life for Egyptians alone. Do I have to change my five year old username to something else and remove the word masri to stop people from assuming I am biased towards Egypt!? I won't :)
  2. Regarding the redundant initial glottal stop, I was only referring to transliteration, not the original Arabic spelling. I remember reading about Hans Wehr's transliteration that it drops the redundant initial glottal stops from transliteration, as ALA-LC does. The initial glottal stop in transliteration is not only redundant, but also distracting, since it is natural for anyone with limited knowledge of basic Arabic phonology that glottal stops come naturally as a syllable break between vowels unless one of them is elided which is reflected in transliteration: example Eid al-Adha, is pronounced in Classical Arabic ʿīd-u l-aḍḥā.   (ʿīd-u al-aḍḥā) →‎ ʿīd-u l-aḍḥā, which is also transliterated as ʿīd ul-aḍḥā. There's no hiatus in Arabic phonology, that is the same case for all modern spoken Arabic dialects and probably all other Afro-Asiatic languages, with the exception of Israeli Hebrew. The other reason for the redundant initial glottal stop distraction is that it is slower, for learners or people who want to quickly have an idea about how the words are loosely pronounced in Arabic, to distinguish and remember to pronounce an odd pronunciation other than that for ع at the beginning of the word. ʿ and ʾ appear very similar.
--Mahmudmasri (talk) 04:14, 16 November 2013 (UTC)[reply]
Mahmud, I don't have any prejudice against Egyptians, quite on the contrary :) but the Egyptian relaxed style of spelling is rather well-known, as it seems, specifically dropping initial hamza, replacing "ي" with "ى" and "ة" with "ه". I didn't even think about you being Egyptian when replied. I've learned it first from the Arabic language forum on Word Reference site, where one of the moderators is an Egyptian woman. The topic of spelling was often discussed. This small controversy is also described at Wikipedia. Most learners agree that the more strict spelling is better, from which you can easily make a more relaxed spelling. Well, hamza is a letter or a graphical symbol, even if it can be stand-alone or sit under or over other letters. And it has a transliteration symbol, e.g. ʾ. If a word is spelled without it, then hamza transliteration symbol can be omitted, otherwise it should be transliterated, IMO. So, عيد الأضحى should be transliterated as "ʿīd al-ʾaḍḥā" (with ʾ) but hamza-less "عيد الاضحى" as "ʿīd al-aḍḥā" (if we consistently ignore classical case endings), even if they're pronounced identically. --Anatoli (обсудить/вклад) 10:08, 16 November 2013 (UTC)[reply]
The Egyptian use of hamza in Literary Arabic isn't different from any other people's use. However, when writing informally and in dialects, people don't always care to add hamza above or below alef. Arabic letters didn't have a final ي with two dots underneath. That was a west Levantine creation, more than a century ago. I can even show you modern printed Quran from Egypt having no dotted final ى. So, we don't drop the dots under ى in Egypt or Sudan (also sometimes in Yemen and Algeria), we just still write it as it has always been written for much more centuries. The words that are written with ى but pronounced with an open vowel are very few, the case is to pronounce it finally as /i/ with a few exceptions. The colloquial pronunciation for عيد الاضحى can be /-ladˤħa/, not /-eladˤħa/, and as an example, Tunisian Arabic doesn't even have a glottal stop, so /-alʔadˤħa/ pronunciation is only confined to Literary Arabic for them. See also w:Lakhdar Brahimi, it's not el-Akhdar or al-Akhdar. --Mahmudmasri (talk) 17:07, 16 November 2013 (UTC)[reply]
Those may be the dialectal pronunciations, but as a written dictionary, we should focus on the literary language and include pronunciation notes in the pronunciation section. --WikiTiki89 17:27, 16 November 2013 (UTC)[reply]
Mahmud. I won't argue, the spelling conventions are a bit irrelevant right now, what it is important is the transliteration. As I said, we should use the strict spelling and possibly provide alternative spelling, like أنا and انا. I know these features are not exclusive to Egyptian and are very old but currently, many translations with final ى for ي (y) are tagged "Egyptian Arabic", which is not exclusive to Egyptian, of course. --Anatoli (обсудить/вклад) 21:35, 16 November 2013 (UTC)[reply]
Romanization should reflect the written language (not necessarily only “literary,” as we also document regional, informal, and other vernacular forms). If a word is has different written forms, then we have form-of entries for them, with romanizations that reflect the differences. That romanization reflects pronunciation is secondary, because pronunciation varies, which is documented in our pronunciation section. Michael Z. 2013-11-16 18:14 z
It would be totally wrong to transliterate words written with the final ى as "-ā" when ي (y) is meant (pronounced as ī or y) or ه (h) (hāʾ) as "a" (or nothing) when ة (tāʾ marbūṭa) is meant in a more relaxed orthography. Lack of dots doesn't make them conceptually different, so غرفة can be spelled غرفه but it's still "ḡurfa" or عربي can be spelled عربى but it's still "ʿarabī" (like in Russian ё (jo), when it's written as е (je), it's still "ё" conceptually. The word ёлка (jólka) is commonly spelled "елка (jelka)" but it's still transliterated as "jólka", not "jélka"). Dictionaries, including Hans Wehr transliterate reflecting pronunciation, not just for cases above but for foreign and dialectal words. Anatoli (обсудить/вклад) 21:35, 16 November 2013 (UTC)[reply]
I see your point. Not a good example though. In transliterated Russian we typically see е → e/je/ye/ie and ё → ë/jë/yë/ië, not conceptually, but as a strict grapheme conversion. Michael Z. 2013-11-17 01:00 z
The example is absolutely up to the point. It may be that Roman letter "e" with any diacritics is still the same letter but in Russian, letter "ё" is still conceptually letter "ё", even if "two dots" are not written. It's a letter of the alphabet and when asked about a spelling, it's called "ё". The same way, Arabic "ي" is one letter, "ى" is another but when the latter is used instead of the former, it's still a "yāʾ", not "ʾalif maqṣūra". --Anatoli (обсудить/вклад) 03:42, 17 November 2013 (UTC)[reply]
Well it depends how you look at it. An alif maqsura is just a yaa that is pronounced like an alif. Just like Russian "ё" is just a "е" pronounced as /o/. In both cases the diacritics were added later as an afterthought. I do agree, though, that today they are seen as distinct and it make sense for us to treat them that way. --WikiTiki89 04:53, 17 November 2013 (UTC)[reply]
Anatoli, your example is incorrect. The letter ё is generally not transliterated jo for Russian. A letter’s conceptual soul is irrelevant if the reader cannot determine what it is. This is so for English-language librarians, who romanize Russian ё as Latin ë, and apparently also for the Russian passport office, which is using e/yeMichael Z. 2013-11-17 17:04 z
We are a dictionary not a library or passport office. Also, I don't see how Latin "ë" is different from "jo", the distinction is made in both cases. And this is all irrelevant anyway, because I don't see anyone suggesting that we transliterate alif maqsura as "ī" or "y". --WikiTiki89 20:17, 17 November 2013 (UTC)[reply]
If it’s irrelevant, then why did you bring it up? Latin ë is different from jo because the word is transliterated elka or ëlka based on how it is written in a text, not on how it is pronounced. When the “yo” is written as е, then it is transliterated e, not ë, and certainly not jo. Russian transliteration is performed on a graphical е or ё, not a conceptual ё.
If you see some other precedent in dictionaries, then I am interested in learning about it, but generally monolingual or bilingual dictionaries do not transliterate.
Without agreeing on any principles as a basis for transliteration, this is all just chit-chat about everyone’s favourite Latin character for a particular foreign character. Michael Z. 2013-11-17 22:31 z
I think I see what you are saying now. But it is still irrelevant because our entries always use ё when appropriate. Similarly, our main entries will always use ي when appropriate. The only time this is relevant is in alternative forms, which should transliterate based on what is implied. Keep in mind also that when Arabic text is fully vocalized, as is ours, then final ي and ى are differentiable based on the vowels, even when both are written as ى. So the transliterations are still following rules that do not require knowledge of the word (as long as the vowels are known). --WikiTiki89 00:44, 18 November 2013 (UTC)[reply]
The only time this is relevant is in alternative forms, which should transliterate based on what is implied – that is wrong. The purpose of transliteration is to represent the form. In Russian, for example, transliteration represents Cyrillic spelling via Latin letters. For someone who doesn’t read Cyrillic, the transliteration is the only way to understand the difference between an alternative form and the lemma form. our entries always use ё when appropriate – Except for the ones that don’t, like alternative-form entries. And except for our citations of real usage, which quote original texts. And except for any notes or other text that refer to actual usage. Michael Z. 2013-12-04 20:52 z
There is no benefit to a transliteration system that fully distinguishes every nuance of the original spelling. If someone is willing to learn such a system, they are better off learning the original alphabet. The purpose of transliteration here is to aid readers that don't care enough about the language to learn the alphabet, and occasionally to disambiguate what is not written in the original script. We have combined these two purposes together, because in almost every case they are not mutually exclusive. --WikiTiki89 21:04, 4 December 2013 (UTC)[reply]
We are getting off topic. We really need to have a wider discussion about the principles behind our transliteration systems in an appropriate place, because editors share mutually exclusive views.
There is no benefit to a transliteration system that fully distinguishes every nuance of the original spelling. – there is a benefit in representing the 33 letters of the Russian alphabet in such a way that two different spellings can be recognized such. I would say this requires less nuance. There is a benefit in transliterating three related East Slavic languages in a compatible way, for readers that don’t know them, or that don’t know Cyrillic at all. There is a benefit in transliterating in a way that is used by hundreds of other publications.
to disambiguate what is not written in the original script – do you mean to represent pronunciation? We have do have both pronunciation and transliteration. Michael Z. 2013-12-04 21:54 z


If we transliterate Arabic letter to letter, what's written, then we shouldn't have any transliteration at all and refer users to the alphabet page - Appendix:Arabic alphabet (it's missing diacritics and additional symbols).
  • Letters و and ي can be either consonants w/y or vowels ū/ī. In loanwords they can also represent u, o/i, e.
  • ى when used in place of ي is still conceptually ي and should be transliterated accordingly and that's what dictionaries do. (yes, it's the same with Russian ё/е)
  • غ can be used to transliterate /g/ into Arabic and can be either ḡ or g.
  • Dialects and loanwords don't follow the standard way of reading letter.
I won't list all situations, most of them have been addressed already. In any case, phonetic romanisation is in no conflict to standard dictionaries or textbooks. Hans Wehr romanises words phonetically, reflecting pronunciation, not spelling. The participants in Module_talk:ko-translit/testcases all agreed that we need transcription type of transliteration, not letter-to-letter, which is useless, like transliterating English words into Cyrillic or Arabic one to one.
Importantly, Arabic may not be always written with diacritics (as is usually the case in the real world) in user examples, see also's, synonyms, translations and even entries. So, if we omit unwritten vowels in transliterations, we get absolute rubbish!--Anatoli (обсудить/вклад) 22:22, 4 December 2013 (UTC)[reply]

The result of the discussion[edit]

There seems to be an agreement to use Hans Wehr transliteration for Literary Arabic when possible. So, let's do it. --Mahmudmasri (talk) 21:20, 17 November 2013 (UTC)[reply]

Yes, Mahmud, let's do it. Are you OK with having dotted final yāʾ, tāʾ marbūṭa and initial hamza terms as the main entries and dotless/hamza-less terms as alternatives? E.g. عربي - main entry, عربى - alternative form (currently Egyptian only)?
There remain some details about of loanwords and dialects. (since the use of hamza in loanwords is not standardised, we can transliterate it if it's written and omit if it's not, very easy - the hamzated/hamza-less could have alternative forms, there is absolutely no problem with this - أفريقيا, افريقيا and إفريقيا can all have entries, one being the main entry, the others alternative)
Vowels (long and short in loanwords), dialects. The common practice, Hans Wehr transliterates loanwords the way they are pronounced by Arabs, not the way they are spelled. Long vowel letters often (but not always) pronounced short. Use of o, e, ō, ē, which are absent in Classical Arabic.
Consonants, as above, need to mentions exceptions for loanwords and dialects, which will not match the table. We also need to include Maghrebi special letters ڢ (f) and ٯ, which are not covered. We need to explain why some consonants are transliterated differently in loanwords or dialects, such as "ج" or "غ" as /g/. Include all dialectal letters and rarely used, such as Persian پ (p) or ڤ (v), etc.
Re: tanwīn. We usually don't transliterate classical case endings, you seem to have no issue with this but we can make exceptions for special templates, which show inflections and transliterate them - un, -in, -an, etc.
For verbs we use full endings, i.e كتب is "kataba", not "katab". There are existing templates and module is being developed (very slowly). (You could join and help there!) --Anatoli (обсудить/вклад) 00:01, 18 November 2013 (UTC)[reply]
The maghrebi letters ڢ and ٯ are just their traditional way of writing ف and ق. They are the same letters, just dots are used in different ways to differentiate them. They should be transliterated identically. --WikiTiki89 00:44, 18 November 2013 (UTC)[reply]
Yes, I know, we just need to include them for consistency.
Another rule to include: we ignore assimilation of "الـ‎" before "sun letters", so الشمس is "al-šams", not "aš-šams". --Anatoli (обсудить/вклад) 02:16, 18 November 2013 (UTC)[reply]
Is it possible to transliterate Persian, Urdu and other Arabic-script languages in a compatible way? Michael Z. 2013-11-17 22:42 z
There is no point in making Persian similar to Arabic. Letters س‏, ث‏ and ص are all pronounced /s/ and letters ز‏, ذ‏ and ظ are all pronounced /z/, ع is usually ignored (like in European languages, Persian lacks the Arabic sound produced by ع but preserves Arabic spelling in loanwords). Vowel length is different, where Arabic uses ū and ī, Persian uses u and i.
Urdu is usually romanised similar to Hindi using symbols, like ś (instead of š) and special symbols for language specific letters, not based on the script but based on its relationship with the sister language Hindi. It's best to ask this question on appropriate pages, asking users who work with these languages, such as Dijan, ZxxZxxZ and others. These languages use different from Arabic standards (or non-standard but accepted Wiktionary way, which may include a number of standards). --Anatoli (обсудить/вклад) 00:01, 18 November 2013 (UTC)[reply]
  • Yes, the main entry as ي and ى as an alternative.
  • Yes, the main entry as ة and ه as an alternative.
  • Yes, for the alef, however, Africa is also alternatively pronounced ifriqya (the second vowel is either short or long). I even remember that the word Africa has its etymology for some tribe called ifri so maybe that's why it is alternatively with the إ.
  • The custom for transliterating vowels in loanwords, mentioned by Anatoli, is absolutely OK. In fact, I usually find mistakes in Wiktionary and correct them in that manner.
  • The Maghrebi traditional alphabet are used rarely nowadays, so no need to mention them, unless you are writing a page specifically about the letters. ٯ /q/ doesn't have a dot when written finally, but has the confusing dot above ڧ when written initially or medially. The ف /f/ was traditionally written with a dot underneath ڢ.
  • The ڤ /v/ is problematic, because it conflicts with ڨ /ɡ/ used in Tunisia and Algeria, They use ڥ instead for /v/. But, we would still have to provide both alternative forms if the word in question is known to northwest Africans. ڤ is the main entry and ڥ is the alternative form.
  • Anatoli's suggestion about the use regarding ج and غ or others for /ɡ/ is fine, in fact that was the custom I noticed in Wiktionary. Side note: Arabic transliterations which were initially intended to transliterate /ɡ/ in Arabic alphabet are very often distorted to have the pronunciation [ɣ~ʁ].
  • If the case ending was intended to be written, it's better to transliterate it with a dash to avoid confusion, example: katab-a, not kataba. This style is accepted in ALA-LC and DIN 31635. I'm not sure about Hans Wehr.
  • There is a popular transliteration I noticed for Persian (UniPers) which uses circumflexes instead of macrons, but some long vowels are transliterated with normal Latin letters without any diacritic. Unfortunately, it may be difficult to standardize transliterations for Arabic-based scripts, although we can make them as close as possible, but we would deviate from the known standards to transliterate each language.

--Mahmudmasri (talk) 02:21, 18 November 2013 (UTC)[reply]

Hans Wehr transliterates nouns in their pausal forms and verbs with full ending, with no hyphens (kataba, not katab-a).
We can handle Maghrebi letters later, they're minor issues.
Loanwords with ج and غ could be transliterated with alternative forms, e.g. "injilīzi, ingilīzi" or g, ḡ with various pronunciation, reflecting intended, real or regional accents (not sure if all terms with ج should have Egyptian "g", probably not, since this is expected, another issue is ق pronounced ʾ in dialects).
No issue with إفريقيا, see hamza topic above, we can have alternative forms and variant transliterations. --Anatoli (обсудить/вклад) 02:34, 18 November 2013 (UTC)[reply]
In dialects that normally pronounce ق as /ʔ/, still pronounce it as /q/ when speaking MSA, so ق is not an issue. Also, you forgot to mention the use of ك (or even گ) for /g/. --WikiTiki89 02:46, 18 November 2013 (UTC)[reply]
Yes, I haven't used all examples, just the main ones, the same for Persian/Urdu letters used in Arabic occasionally and dialectal letters, e.g. to render "č" in some dialects.
Please see أفريقيا for treatment of hamza, long/short vowels in loanwords (this discussion is split, there is a dedicated hamza topic above). --Anatoli (обсудить/вклад) 02:54, 18 November 2013 (UTC)[reply]
  • إنجليزي (A-N-Ǧ-L-Y-Z-Y) / إنكليزي (A-N-K-L-Y-Z-Y) should only be transliterated as ing(i)līzi since it is only pronounced with /ɡ/, not /(d)ʒ/ or /k/.
  • افريقيا (regardless of the first vowel), has its /q/ pronounced the same in dialects or approximated to [k].
  • I really didn't like the transliteration of the initial glottal stop. It's impracticably hideous. Look how many transliterations we have for أفريقيا!
  • In Egypt, it is acceptable to use ج (with 1 dot) for /(d)ʒ/. There are some loanwords with that consonant and are either written with چ (with 3 dots) or ج (with 1 dot).
  • The distorted loanwords that acquired a pronunciation, should be transliterated with to reflect the pronunciation, as for Ghana غانا ḡāna [ˈɣæːnæ] (Egyptian pronunciation) [ˈʁɑːnɐ] (Persian Gulf pronunciation), but it is pronounced with its original /ɡ/ by more educated people in Lebanon and northwest Africa.
  • The ending vowels
    1. Should be transliterated as long (ā, ī, ū) for words of Arabic origin, or non-Arabic words with acquired Arabic case endings.
    2. We shouldn't transliterate final vowels as long even if they are spelled with ا,‎ و or ي in plain loanwords. That rule should also apply when transliterating Arabic dialects. Therefore, أفريقيا should take a final plain-a without a macron.

--Mahmudmasri (talk) 03:49, 18 November 2013 (UTC)[reply]

  • إنجليزي - OK but some people wanted to mark /(d)ʒ/, perhaps in some areas it's pronounced so or by less literate people? It may be less obvious with less common words that "ج" should be "g" if the original foreign word has /g/.
  • افريقيا - no problem with that, we are aware that ق doesn't always change the pronunciation, notably القاهرة (al-qāhira, Cairo) (providing an example for the sake of any other participant), in any case, MSA is the target, its pronunciation takes priority over dialects.
  • غانا OK but as far as I know, غ is also used to render /g/, which should be transliterated accordingly or have comma-separated variants. Not sure if we need to discuss all cases, suffice to say that we agree to transliterate exceptions to render the transliteration accordingly, even if individual letters don't match our table - special notes is already there, we can expand it with short details.
  • Long vowels probably OK with me but I'll have to think about it. Final unstressed "ā" consistently loses length.
  • Why do you dislike the transliteration of the initial glottal stop? Is it because ʾ is missing on the keyboard? You don't have to transliterate alternative, derived forms, synonyms, etc. I only provided examples of how to transliterate the actual entries. Let me think about hamza problem. --Anatoli (обсудить/вклад) 04:08, 18 November 2013 (UTC)[reply]
  • إنجليزي is always pronounced with /ɡ/ even by less educated people.
  • The case for final vowels in unstressed syllables is the same for all other vowels, not just ā.
  • غ can be used to render /ɡ/ mainly in the Levant and if it is (still) pronounced /ɡ/, it should be transliterated with the normal g.
  • The problem isn't with keyboards! Initial glottal stops are redundant and don't add any etymological information, since transliterations' primary role is to ease etymological inspection and when the words are borrowed in other languages, the glottal stop rules are totally ignored. If someone wanted to know whether the Arabic word is spelled with an initial hamza, then he should check the Arabic spelling.
  • Literary Arabic rules are simple:
  1. initial a- u- are always written with a hamza above.
  2. Initial i- has two rules:
  • is written without any hamza as انتخاب intiḵāb "an election"
  • or with an under hamza in other words as إنتاج intāj "a production".
In loan words, you are free to choose one of the two initial i- rules. Some teachers advise not to use an initial under hamza for loanwords with initial i-, claiming that اللغات الأعجمية al-luḡāt al-aʿjamiyya don't have glottal stops. This is however a shallow badly informed statement. Remember, إفريقيا is a loanwords, but it is spelled with an under hamza for the i- pronunciation.

--Mahmudmasri (talk) 04:49, 18 November 2013 (UTC)[reply]

What about الـ (al-)? More importantly, the word الله (allāh), which has a hamzat al-waṣl? Are these the only two words, which are always written without a hamza over alif and start with an "a"? --Anatoli (обсудить/вклад) 05:03, 18 November 2013 (UTC)[reply]

If there are no objections from others, I'm OK to drop the requirements for initial hamza transliteration, this transliteration will match Hans Wehr but Hans Wehr doesn't use initial hamza in Arabic words but we do. --Anatoli (обсудить/вклад) 05:11, 18 November 2013 (UTC)[reply]

Yes, exactly, the only few exceptions are the definitive article and allah, since both are reduced when pronounced after words ending with vowels, whether it is because of case endings or just nouns ending with vowels. But, you should notice that words with hamzat waṣl should be pronounced properly in Literary Arabic without a glottal stop, if preceded by a word ending with a vowel. Example: /al.in.ti.ˈxaːb/ vs /al.ʔin.ˈtaːɡ/. But, as I told you, in the real life use by all Arabic speakers, people confuse these things or pronounce a glottal stop in all these cases. --Mahmudmasri (talk) 05:15, 18 November 2013 (UTC)[reply]
No worries. Ha-ha, you used the Egyptian pronunciation: /al.ʔin.ˈtaːɡ/ (not that it matters much) ;) --Anatoli (обсудить/вклад) 05:24, 18 November 2013 (UTC)[reply]
Hamzat wasl can occur with any vowel. As far as I know, /i/ is the most common, /a/ is only in the definite article (the first part of الله is the definite article), and /u/ occurs only in the imperative of Form I verbs that have /u/ as the characteristic vowel of the imperfect (such as اُكْتُبْ). --WikiTiki89 13:59, 18 November 2013 (UTC)[reply]
You are right, I missed that imperative, as I was only thinking of the common usage in Literary Arabic. However, the word allāh is a typical example for an exception rule. It isn't treated as a definitive article added to lāh, but one proper noun whose initial vowel can be elided as I explained earlier. --Mahmudmasri (talk) 19:21, 18 November 2013 (UTC)[reply]
الله (allāh) is not an exception. It is ال (al-) + إله, with a semi-regular elision of the glottal stop (الإله was later re-introduced). The same thing happened with ال (al-) + إناس (ʔinās), which produced الناس (an-nās) (from which ناس (nās) is a back-formation). It also happened in multiple other cases but those are the only two I remember. --WikiTiki89 19:41, 18 November 2013 (UTC)[reply]
Now you are speaking about etymology, not how the words are used. الإله is used to mean "the deity (any generic god)", but الله means "God" (the god in Islam and Abrahamic religions). --Mahmudmasri (talk) 20:50, 18 November 2013 (UTC)[reply]
Either way, it acts like the definite article in every way in terms of elision and the alif is dropped when prefixed with لِـ (li-). --WikiTiki89 22:28, 18 November 2013 (UTC)[reply]
Thanks for the اُكْتُبْ example, Wikitiki89. That reminds me we'll need to make sure templates and modules don't add hamza on those forms.
I need to clarify re long unstressed vowels. So, does it mean that we need change final ī to i, ā to a, ū to u? It seems OK with ā and ū but nisba ending "-ī" (formerly - "-iyy") seems very unusual if we change it to "-i", e.g. عربي becomes "ʿarabi". --Anatoli (обсудить/вклад) 01:20, 19 November 2013 (UTC)[reply]
I think Mahmud was only referring to final long vowels in loanwords. --WikiTiki89 01:49, 19 November 2013 (UTC)[reply]
  • I have an impression that the common practice in Wiktionary gives a different impression about how Literary Arabic is used. The ending ʿarabiyy isn't false, but it's very unusual to hear someone pronounce it that way. The example عربي, with the template, {{ar-nisba}}, makes strong emphasis on the nunation, when in fact that over-use of nunation is Classical/poetic. Therefore, the line demonstrating the adjectives should be something like that:
  • عَرَبِي • (ʿarabī) m, dual masculine عربيان (ʿarabiyyān), feminine عَرَبِيَّة (ʿarabiyya)‎, dual feminine عربيتان (ʿarabiyyatān), oblique masculine plural عربيين (ʿarabiyyīn), masculine plural عَرَبِيُّون (ʿarabiyyūn)‎, feminine plural عَرَبِيَّات (ʿarabiyyāt)‎
  • Answering how to transliterate the ending ي in words like (ʿarabī), I suggested earlier in that discussion, to transliterate it with ī-macron, for the Arabic case ending, but if that final ي was a part of a loanword or a non-Arabic name, then we should transliterate it with normal-i. But, looks like that would introduce complications, so let's only transliterate all final vowels in unstressed syllables with un-diacriticized letters. That can also simplify the automation in {{ar-nisba}} to just add endings based on ʿarabi/-yyān/-yya/-yyatān/-yyīn/-yyūn/-yyāt.
  • By the way, the final /j/, as in the word رأي /raʔj/ (one syllable) should be transliterated as raʾy not raʾī, because the second gives an impression of another pronunciation /ˈra.ʔi(ː)/ (two syllables). However, the second pronunciation appears in colloquial pronunciation in the Levant.

--Mahmudmasri (talk) 05:22, 19 November 2013 (UTC)[reply]

I disagree. Our transliteration shouldn't worry about how templates will be written. With Lua, we can now create more complicated templates such as {{ar-prep-auto}}. As for ʿarabiyy vs. ʿarabī, our transliterations should not accommodate too much for modern pronunciation. We should try to transliterate what is written, and for nisba, that is ʿarabiyy(un). Our pronunciation sections, or even this page itself (Appendix:About Arabic) can cover the fact that it is pronounced simply as /ʕarabi(ː)/ by most speakers. --WikiTiki89 05:33, 19 November 2013 (UTC)[reply]
  • Simplifying for the automation would be a coincidence, not the primary reason, because I know that templates can do a lot more complicated conversions. I don't want my reasoning be distorted. No accommodation here!
  • Kasra plus ي don't give iyy, but give ī. I noticed a confusion which assumed that this case should be iyy.
  • Kasra plus ي plus šadda give iyy.
  • Here comes the question again: Should we demonstrate some fantastic language which would only exist in Wiktionary in that case, or should we be realistic in our transliterations? Apparently, popular standardized transliterations go for the ending ī or i.
  • Is it really a pure transliteration or halfway between a transliteration and a transcription? Can there be a scheme to reflect both of the original Arabic spelling and the loose pronunciation? All the popular standardized Arabic transliterations fail to reflect the original spelling in many cases and sometimes fail to reflect the loose pronunciation, so you shouldn't over worry about the need to reflect the original Arabic spelling, since it is practically impossible. The final ى (alef maqsura) is transliterated as medial or final alef: ā. ALA-LC transliterates it with an accute accent á mistakenly suggesting that its syllable is stressed. The Spanish Arabists School does a better job by transliterating it as à. Silent ة is either not transliterated in some schemes or transliterated in others as final ه by h, but when pronounced as /t/, it is transliterated as t conflicting with ت, ISO 233 transliterates it as all the time, ISO 233-2 transliterates it all the time as ŧ, not distinguishing its silent from /t/ pronunciation. However, almost all schemes, including the aforementioned, have many exceptions, like when Arabic letters are used to render different consonants from what they normally do (ج ك غ ق), additionally when normal Arabic letters may or may not be diacriticized to render other consonants from their normal values (چ گ ڠ ڨ ڤ ڥ) vs (ج ك غ ق ف).

--Mahmudmasri (talk) 22:25, 20 November 2013 (UTC)[reply]

I did not mean to distort your reasoning, that is just the way I understood it. I don't who was confusing kasra + yaa with kasra + yaa + shadda. I would support showing ة with the optional case ending, for example كَسْرَةٌ would be kasra(tun), but I think that something like kasraẗ looks really bad. I agree that loanwords will always be exceptions no matter what scheme we come up with. I would also support transliterating alif maqsura as something else, such as "à". --WikiTiki89 00:22, 21 November 2013 (UTC)[reply]
It's impossible to re-transliterate perfectly back into English, so I would stick to Hans Wehr, i.e. ة as "a" (i.e. nothing, because it's preceded by fatḥa) or e, i in Levantine dialects. (For declension templates, we could use full endings) and ى (alif maqṣūra) as ā. --Anatoli (обсудить/вклад) 22:34, 4 December 2013 (UTC)[reply]

Prefixes and Suffixes[edit]

I propose we use the ـ character to indicate prefixes and suffixes. For example, move the entry ال (al-, the) to الـ (al--, the) and ني (me) to ـني (me).

Does anyone disagree? --WikiTiki89 01:45, 4 November 2013 (UTC)[reply]

I disagree. "ـ" (tatweel) could be used in the display, like this: الـ (al--, the), similar to the way ḥarikāt (vowel points) are added to the headword but not part of the entry name. --Anatoli (обсудить/вклад) 02:03, 4 November 2013 (UTC)[reply]
Could you maybe give a reason for disagreeing? We use some sort of horizontal line symbol (-, ־) in the entry name for every other language. --WikiTiki89 02:11, 4 November 2013 (UTC)[reply]
Tatweel or kashida is only used as the elongation symbol, giving only a different visual effect, for readability, it has no grammatical, lexical, punctuation or other value. There's no tradition in any Arabic dictionary to use ـ, unlike "-" (hyphens) in prefixes or suffixes in European languages, ـ is never used so. Even in the header, ـ in الـ (al--) is used to show it can be attached to nouns and romanised as "al-". --Anatoli (обсудить/вклад) 02:19, 4 November 2013 (UTC)[reply]
I suppose I am just too used to the other languages indicating prefixes and suffixes. To me ال (al-) looks really weird, since the prefix is never actually found with the final form of the ل. --WikiTiki89 03:53, 4 November 2013 (UTC)[reply]
Arabic dictionaries, notably w:Hans Wehr, don't use tatweel in the article headers.
I've sent you an email. You've got my support in this work. Don't worry about Mzajac, he's not going to work with Arabic, anyway but will surely stir enough trouble with negative comments to discourage any participation. I'll probably refrain from further comments here. --Anatoli (обсудить/вклад) 22:21, 5 November 2013 (UTC)[reply]
Ok, I will withdraw the tatweel for affixes proposal. And I never worry about negative comments. One thing you have to remember is that the transliteration system is not nearly as much for editors of Arabic as it is for readers. The readers' opinions should always be more important the editors' and just because Michael Rabbit doesn't edit Arabic, doesn't mean he doesn't read our entries. --WikiTiki89 23:23, 5 November 2013 (UTC)[reply]
That's the idea, to make it useful for learners. The chat version of transliteration was added when an Arabic learner (Beru7) was active. I have been correcting it, when he left to standard. A few letters come from different standards and there always was a conflict regarding certain letters, like "j" is preferred by most over "ǧ" and "x" over "ḵ" or "ḫ" (Mahmudmasri must have changed it back to "ḫ"). Otherwise the current standard (or practice) is very close to Hans Wehr's system, the most known Arabic dictionary. Actually, most users have no choice but learn that system, there's no other dictionary as comprehensive as this for English speakers.--Anatoli (обсудить/вклад) 23:37, 5 November 2013 (UTC)[reply]
Actually, the 4th edition of Hans Wehr has changed "ǧ" to "j", "ḫ" to "ḵ" and "ġ" to "ḡ". I welcome "ǧ" to "j" change and the other two can be discussed. Note that dialect transliteration and foreign words differ from expected. ج (j) is pronounced as "g" not only in Egyptian but in many loanwords, which came into MSA via Egyptian. --Anatoli (обсудить/вклад) 23:45, 5 November 2013 (UTC)[reply]
I like the first two changes ("ǧ" to "j", "ḫ" to "ḵ"). --Z 14:46, 10 November 2013 (UTC)[reply]

To Anatoli: I know that you may prefer x, since it is not a diacriticized letter and also looks like a similar letter with the same of similar pronunciation for each of, Cyrillic, Greek and IPA letters, however, I was sticking to published standard transliterations, not made up transliteration, because if we would do what we see fit the best, why your changes would be better than what I myself see more fit for the real pronunciation by Arabic speakers.

To WikiTiki89: No, I definitely disagree with using a substandard elongation to separate prefixes or suffixes. Hebrew spelling is similar, but they are never separated, however, we can separate them by dashes in transliteration. You also have to notice that when using elongation, it makes it difficult to search for words through Wiktionary, since we must write the words with the exact same elongation chosen by the editor, or else the search engine would consider them totally different words. If you wrote in the search الكتاب, you will never find in the results for the word if it existed as الـكتاب. --Mahmudmasri (talk) 23:03, 15 November 2013 (UTC)[reply]

I did not mean to connect the prefix or suffix to the word with a tatweel, but to move articles about the prefixes or suffixes themselves to a page with a tatweel. For example, in English we do prefixes and suffixes like this: un-, -ly. I was proposing that for Arabic we would move the ال to الـ and كم to ـكم. But Anatoli already convinced me that it is not a good idea because most Arabic dictionaries don't do this. --WikiTiki89 23:15, 15 November 2013 (UTC)[reply]
OK, I also agree to leave them. When we write these separate prefixes and suffixes by hand, we have the option to use the elongation, but it is not a must. --Mahmudmasri (talk) 23:31, 15 November 2013 (UTC)[reply]

I think al- should show assimilation and elision[edit]

@Atitarev @Wikitiki89 I disagree with the current statement about Romanization that "ال always gives al- regardless of elision and sun and moon letters rules". In any case I've largely been ignoring the part about elision when editing transliterations (because of not realizing until just recently that this was the convention), and I think that assimilation to sun letters should also be shown. Hans Wehr's dictionary, which we largely follow the conventions of, definitely shows assimilation (an-nūr not al-nūr). For beginners, I don't think it much helps to have transliterations like ṣabāh al-nūr or bi-smi llāhi al-raḥīmi al-raḥmāni that don't reflect pronunciation very well (vs. ṣabāh an-nūr or bi-smi llāhi r-raḥīmi r-raḥmāni). Benwing (talk) 23:29, 2 November 2014 (UTC)[reply]

I don't have a strong opinion about this. Both ways are used. I don't mind if you implement assimilation of "l" and the vowel elision. Interesting that in dialects where ج is pronounced as /ʒ/, not /dʒ/, the letter is a Sun letter, so assimilation happens in الجَزِيرَة (al-jazīra). --Anatoli T. (обсудить/вклад) 23:45, 2 November 2014 (UTC)[reply]
I think it should not be indicated for the reason Anatoli gave. It's a pronunciation detail, not an orthographic detail. —CodeCat 23:47, 2 November 2014 (UTC)[reply]
The Korean transliteration module uses a lot of assimilations, which is standard and described by an authority. As I mentioned, both ways are possible. Qur'an, which is more likely to have transliterations, is transliterated either way - with or without assimilations and elisions. Elisions may be even more important to get the right number of syllables and for rhyming. As for my example, it's fine to focus on MSA, Classical or Qur'anic Arabic, despite the fact that modern realizations of standard Arabic differ by regions. /dʒ/ is a prescribed and classical pronunciation of ج (j) (although it may not be original). --Anatoli T. (обсудить/вклад) 00:01, 3 November 2014 (UTC)[reply]
Elision can also be marked with the existing diacritic ٱ (hamzatu l-waṣli) but it's seldom used and causes some issues with display, e.g. ٱلله (llāh). Also the kasra on للهِ causes the ligature لله to be displayed as separate letter, compare with الله (allāh). --Anatoli T. (обсудить/вклад) 00:30, 3 November 2014 (UTC)[reply]
In response to CodeCat, writing assimilation is an orthographic feature, because it's indicated in vocalized Arabic. Words like an-nūr are written اَلنُّور in fully vocalized texts, with a shadda diacritic over the nūn (n) indicating its pronunciation as a geminate consonant, and no diacritic over the lām (l) indicating its silence. A word like al-būr without assimilation is written اَلْبُور when fully vocalized, with a sukūn over the lām (l) indicating no vowel, and no shadda over the bā' (b). Writing assimilation is the same as not writing the trailing silent alif in the 3rd-plural verb ending or in the accusative ending -a, and not writing the trailing silent wāw in name عَمْرو (ʿamr) -- in these cases, the vocalized Arabic again attempts to indicate the silence of these letters. We also follow pronunciation in the writing of ة (tā' marbūṭa), which gets written either as nothing or as t depending on pronunciation, and in writing صلوة as ṣalāh following pronunciation even though it has a wāw in it, meaning an orthographic transliteration would be something like ṣalwa. This is consistent with the practice of Russian, where e.g. the genitive is written -ovo rather than -ogo.
@Atitarev: The assimilation of ج is a dialect feature that is (at least for many dialects) optional and less found in borrowed MSA words, so I suspect the tendency is not to pronounce it as assimilated when speaking MSA. Words like al-jazīra would of course be written without assimilation, following normative pronunciation and the spelling of vocalized Arabic texts. Benwing (talk) 01:59, 3 November 2014 (UTC)[reply]
Very good example with full vocalization - اَلنُّور, I didn't think of it. --Anatoli T. (обсудить/вклад) 03:38, 3 November 2014 (UTC)[reply]
Support for the reasons given by Benwing (which I believe I have myself expressed in past). --WikiTiki89 03:39, 3 November 2014 (UTC)[reply]

Some changes[edit]

@Benwing Hi, I've added some stuff about ʾiʿrāb endings un/in/an, additional, rare letters - incomplete, pls. check. What's the alternative form of ق used in Maghrebi Arabic? --Anatoli T. (обсудить/вклад) 03:55, 5 November 2014 (UTC)[reply]

RFM discussion: February 2014–January 2015[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Merge North Levantine Arabic ("apc"), South Levantine Arabic ("ajp"), and Syrian Arabic ("sem-syr") into Levantine Arabic

See also: User_talk:Stephen_G._Brown#Merging_Levantine_Arabic_dialects_into_one_language

There is absolutely no reason to have these as separate languages. Yes, it's true that there are some pronunciation differences between various dialects, but the vocabulary is largely the same (except that the Lebanese dialect uses a lot of French borrowings and the Israeli dialects use a lot of Hebrew borrowings). The grammar is identical. But the most compelling reason is that the divisions we have are completely arbitrary and there is much more variation within these regions than there is between them (not to mention that Syrian Arabic is no more than a sub-dialect of Northern Levantine, seemingly leaving Northern Levantine to refer to just Lebanese). --WikiTiki89 22:41, 19 February 2014 (UTC)[reply]

See also Category talk:Syrian Arabic language. --WikiTiki89 23:15, 19 February 2014 (UTC)[reply]
I don't know enough about it to say whether apc and ajp should be merged, but sem-syr should definitely be abolished and merged into apc. —Aɴɢʀ (talk) 16:08, 20 February 2014 (UTC)[reply]
I was actually very surprised to find that there are two separate ISO codes for North and South Levantine. They are about as different as Bostonian vs New York English. --WikiTiki89 22:54, 20 February 2014 (UTC)[reply]
I wonder what the code sem-syr was intended to represent. There is more variation between one region of Syria and the next than between one region of Syria and a neighbouring region of Iraq or Lebanon. One textbook (Sociolinguistics / Soziolinguistik, 2006, ISBN-13 978-3-11-0184181) sums it up by saying "In terms of dialectal variation, three major Arabic dialect groups are represented within Syria. In the north eastern region (bordering Iraq) dialects of the Mesopotamian group are spoken, and the rest of the Syrian desert in the east is of Najdi Peninsular Arabic type. In the rest of the country varieties of North and South Levantine Arabic are spoken, which can be distinguished from one another by a number of features. In the south western corner (bordering Jordan), a type of southern Levantine (commonly known as Haurani) is spoken..." WP suggests that "Levantine Arabic" is likewise composed of rather more than two mutually intelligible dialects, making our binary split curious. - -sche (discuss) 01:18, 21 February 2014 (UTC)[reply]
North and South are probably the most distinctive difference in the Levantine dialects mostly due to the pronunciation of the long ā. In the North, it's realized as [æː~ɛː~eː] in unemphatic contexts and [ɑː~ɒː~ɔː] in emphatic contexts, while in the South, it is realized as [aː] in unemphatic contexts and [ɑː~aː] in emphatic contexts. There are also some minor vocabulary differences such as the North preferring شو (šū) for "what?" and the South preferring إيش (ʾēš). But these hardly constitute enough of a difference to treat them as separate languages. And as I mentioned before, there are more differences within these regions than there are between them, such as regional realizations of /q/. Mutual intelligibility, however, is a tricky criterion, because if we went by that we'd have to merge all dialects of Arabic (Levantine, Gulf, Egyptian) except for Maghrebi and many small isolated ones. --WikiTiki89 01:45, 21 February 2014 (UTC)[reply]
Heck, if we went by that, we'd have to merge Bokmål and Nynorsk back into Norwegian, and Dutch Low German and German Low German back into Low German, and who knows what other mergers that would probably be a good idea. —Aɴɢʀ (talk) 09:33, 21 February 2014 (UTC)[reply]
Yeah it turns out that for a dictionary spelling is more important than intelligibility, which is one reason that I think Mandarin and Cantonese should be re-merged. But of course it makes it difficult for languages with no standard such most Arabic dialects. --WikiTiki89 18:33, 21 February 2014 (UTC)[reply]
I agree with the merge. The difference between Syrian and Lebanese dialects is more political than linguistical. In fact, by our criteria, a lot of dialectal Arabic words and forms they should go under "Arabic", without any dialect distinction, perhaps a qualifier "colloquial", "regional", "East Arabic", "Gulf", etc. Some words that we mark as Egyptian Arabic are often used in many East Arabic dialects or are just too colloquial to be considered classical or standard, especially many borrowings, also there are some relaxed spelling forms, more typical for Egypt and Sudan but absolutely not exclusive to these regions. I also disagree with inclusion of Egyptian or other Arabic entries/translations, which are only pronounced differently from the standard. Considering how few written dialectal contributions we have, we could merge a few dialects, excluding Maghrebi. Note that dictionaries, such as Hans Wehr include dialectal words, just marking them by a region/regions. I am away in Dubai, London and Brunei, only contributing a little at the moment. --Anatoli (обсудить/вклад) 19:10, 21 February 2014 (UTC)[reply]
I was also thinking that we could merge all of Arabic together, but that would be a much larger decision that would need a vote and would require figuring out how to handle major differences in grammar, such as the fact that most dialects conjugate verbs differently from MSA. Merging smaller closely related dialect groups together is a much easier task and will make it easier to add entries for them. The reason I have refrained from adding Levantine entries is because I can't be bothered to add everything three times. I would suggest that we also merge all Maghrebi dialects into Maghrebi and all Gulf dialects into Gulf, but I personally don't know as much about these dialects to feel comfortable proposing this myself. --WikiTiki89 20:05, 21 February 2014 (UTC)[reply]
Yes, more merges may require a vote, same as with merging Sinitic varieties. Grammatical differences in Arabic dialects may be marked with qualifiers as well but they can be grouped, so that we maintain less dialects, which are almost identical grammatically and lexically. Perhaps such distinction should become important ONLY if we are really going to have e.g. separate conjugation tables for Egyptian, Levantine, etc. verbs (or other inflections), e.g. so that users can find that "bitidrisu"/"byudursu" ("they write") are different in Egyptian and Levantine Arabic. --Anatoli (обсудить/вклад) 21:47, 21 February 2014 (UTC)[reply]
Another consideration. Arabic dialects are mainly spoken, not written with some exceptions. Are forms like ايمتى رحيرجع؟ (Levantine) or حيرجع امتى؟ (Egyptian) "When will he return?" attestable? Standard Arabic inflections can be confirmed by citations. Is it true for dialects? --Anatoli (обсудить/вклад) 22:09, 21 February 2014 (UTC)[reply]
Sources describing dialectal grammar use romanisation for obvious reasons. Dialects use vocalisations extremely seldom. Such inflection tables would not be very useful if they used Arabic diacritics, as e.g. there are no special symbols for "e" and "o" and Egyptians read "q" as a glottal stop and jiim as /g/, so these examples would be written as something like "eemtaa raH-yirja3" / "Ha-yirga3 imta", obviously not attestable forms. Hopefully it makes sense. --Anatoli (обсудить/вклад) 22:27, 21 February 2014 (UTC)[reply]
For Egyptian, they are probably attestable. For Levantine it might be harder, but hopefully still doable for the most common words. Also, in Levantine the future marker رح is usually written as a separate word (thus: ايمتى رح يرجع؟, ēmtā raḥ yirjaʿ). The difficulty of vowel marking will have to be resolved with transliterations, that is our only choice as far as I can see (IPA would be too specific when we don't want it to be). --WikiTiki89 23:07, 21 February 2014 (UTC)[reply]
Re the comment about Nynorsk, etc: I don't want to stray too far off topic, but I do think we should merge Nynorsk and Bokmål. And iff we merged Nynorsk and Bokmål, I would not oppose merging Dutch Low Saxon and German Low German, though I wouldn't support it, either. Dutch Low Saxon and German Low German already represent mergers of a large number of lects that SIL / Ethnologue / ISO had deemed distinct enough to grant codes to: Dutch Low Saxon is Achterhoeks (act), Drents (drt), Gronings (gos), Sallands (sdz), Stellingwerfs (stl), Twents (twd), Veluws (vel); German Low German is Westphalian (wep); frs could mean either East Frisian the Low German lect or East Frisian the Frisian lect (SIL refuses to clarify which one it was intended to mean, and I have strong but as of yet unconfirmed suspicions that it's used both ways around here); and Plautdietsch (pdt) ... well, exists. - -sche (discuss) 23:33, 21 February 2014 (UTC)[reply]
Before we can do this (assuming there will be at least a couple more people who weigh in and agree), we will need to decide on a language code. Should we use "apc", "ajp", or a new one altogether (such as "sem-lev")? Since I don't understand what "apc" and "ajp" actually stand for, I can't say whether it would be appropriate to use one of them for the entirety of Levantine (the only sort of guess I have is that "ajp" could be "Arabic Jordanian Palestinian", but I can't even make a guess for "apc"). --WikiTiki89 20:05, 21 February 2014 (UTC)[reply]
More than likely, apc stands for "Arabic" + "all the descriptive combinations are taken by other languages so we'll use pc". Chuck Entz (talk) 20:47, 21 February 2014 (UTC)[reply]
"Arabic, Phoenician Coast"? "Aarabic, Lepanese and Cyrian"? —Aɴɢʀ (talk) 22:15, 21 February 2014 (UTC)[reply]
A Perfect Circle? Wait, wrong APC...
It's probably what Chuck says. In any case, even if apc and ajp do stand for something, we can still use either one to represent point, if we want to... for "Antillean Creole", we combined the ISO's gcf ("Guadeloupean Creole French") and acf ("Saint Lucian Creole French", but presumably abbreviating "Antillean Creole French") into gcf (even though, if you think about what the codes stand for as abbreviations, it might have made more sense to merge them into acf). - -sche (discuss) 22:30, 21 February 2014 (UTC)[reply]
I think the Arabic dialects that are mutually intelligible with standard Arabic should be handled the way the Romani lects are currently handled, viz. "Only the macrolanguage is allowed an L2 header [==Romani==], but the subdivisions are allowed nested lines in translations tables." In fact, if you read the discussion about Romani, you'll note that I was under the impression that Arabic was already handled that way! If we adopt such an approach, then merging North and South Levantine seems unnecessary, especially if they have such differences as Wikitiki describes ("the Lebanese dialect uses a lot of French borrowings and the Israeli dialects use a lot of Hebrew borrowings"). That's because I think it's easier (especially with Conrad's trans-adder) to add, and also to look at as a reader, separate lines for apc and ajp, vs one line with several entries each with its own qualifier. Whereas, if we want to grant all code-having Arabic dialects their own L2s, then it would make more sense to merge apc and ajp, since I think it's easier to add {{context}} tags to sense-lines than it is to create two separate L2 sections with almost identical content. So, I think it would be best if we developed an idea of what our general policy on Arabic dialects should be before acting here... (No matter what our general policy is, I think it makes sense to abolish sem-syr, converting each of its entries into whichever of apc vs ajp it happens to be.) - -sche (discuss) 22:54, 21 February 2014 (UTC)[reply]
What you may be overlooking is that French borrowings in Lebanese do not carry over to Syrian, and Hebrew borrowings in Israel do not generally carry over to the Palestinian territories, let alone to Jordan. Also, North Levantine is spoken in the North of Israel and uses the same Hebrew borrowings that are used in the South of Israel in South Levantine. Merging all dialects with MSA will be a big process, mostly due to our large number of Egyptian Arabic entries. Merging Levantine together is quick and easy and not much is lost in doing so even if we later decide to merge it with MSA. --WikiTiki89 23:07, 21 February 2014 (UTC)[reply]
I think it's possible and advisable to merge ALL Arabic varieties into Arabic, using e. g. {{context|Egypt|Sudan|lang=ar}} labels and regional categories, such as Category:Lebanese Arabic. Even infrequent inflection tables could have regional context labels. As I mentioned, Arabic dictionaries, which include dialectal forms just label them with Morocco, Syria, etc. (my preference and we have a precedence in Hans Wehr) or Levantine, Gulf. Apart from named dialects there are differences related to specific countries. I'd prefer labels "Lebanon", "Syria" to "North Levantine" or "South Levantine", even though there are differences within countries across dialects - we can combine labels such as {{context|Hejazi|Saudi|lang=ar}}. The majority of words are identical for all dialects, despite differences in pronunciation, there's no need to duplicate information. Words that are different in dialects are quite common but their percentage is small and many words are shared by various dialects, they are often considered more colloquial, not dialectal, especially numerous borrowings, frowned at by Arabic purists but still used quite often. In any case, keeping separate dialect dictionaries is impractical, any dialect also includes the majority of Arabic words. Besides, we have kept Albanian as one language without splitting into Tosk and Ghek, English is not split into American, British, etc. merged Serbo-Croatian varieties and I think we can also merge Norwegian and Chinese. (Merging Arabic varieties is a much simpler task than merging Chinese varieties and makes even more sense). --Anatoli (обсудить/вклад) 07:08, 27 February 2014 (UTC)[reply]
The differences between Arabic dialects are similar to the differences between German dialects such as Swiss German. Swiss German is considered a dialect but it is different enough that it is not mutually intelligible with Standard German. The difference between Arabic and German, however, is that in Arabic, the spelling of individual words is largely identical across dialects, which makes it practical to have them merged. But that does not the only factor to consider. If we merge all of Arabic together, when providing usage examples, we would have to label each usage to specify which dialect it uses. The other solution would be to only use MSA for usage examples, but this would further de-emphasize dialects and make it difficult for people trying to learn them. There are already very few resources online (and even offline) for Arabic dialects (with the possible exception of Egyptian). I think that it would be beneficial to treat the dialects separately from MSA, even if we don't plan on duplicating all of the content. I only plan to add the most distinctive Levantine words as entries, I don't plan on duplicating all of the Arabic content we already have. Finally, as I've already said before, the discussion of merging all of Arabic together is entirely separate from the discussion here. The discussion here is something we can do right now, without a lot of debate and without a vote. Merging all of Arabic together is a long term project. If you want to start a discussion about it, do it at the Beer Parlour, but it does not prevent us from merging Levantine in the meantime. --WikiTiki89 07:26, 27 February 2014 (UTC)[reply]
I know it's a diversion from your original topic but if the idea of merging dialects were met positively, then your request to merge two forms of Levantine would be solved as well. I don't think the problem with the lack of emphasis and resources for dialects will be solved any time soon, including Egyptian Arabic (which is slightly more available than others) and having them under one header is not a problem for adding regional user examples, if editors are aware that we only have "Arabic". I'll consider a new discussion in BP when I'm back home. I agree it would require a vote and a more thorough analysis. --Anatoli (обсудить/вклад) 07:45, 27 February 2014 (UTC)[reply]
I understand that if the idea of merging all of the dialects were met positively that would solve this problem as well, but that would take longer. And I don't expect that the lack of emphasis or resources would be solved any time soon, but I'd rather that Wiktionary be part of the solution and not part of the problem. As for usage examples, how would a reader know which dialect is being used? Should usage examples have context labels as well? --WikiTiki89 07:56, 27 February 2014 (UTC)[reply]
Not labels but a dialect name in brackets. I understand why most dialects are not emphasised by Arabs and why we shouldn't do it either or should promote written dialectal forms, only some of which are attestable. There's no definite right and wrong in spelling, pronunciation and transliteration and by definition, no standard. That's why Arabs mix standard with dialect spellings when writing in dialects. There is a so-called "spoken MSA" or "educated colloquial Arabic" (there are textbooks available, I have two books) where some common dialectal forms are mixed with MSA to produce a new variety. It obviously differs a bit regionally. --Anatoli (обсудить/вклад) 09:04, 27 February 2014 (UTC)[reply]
But spelling is not everything, there is also morphology, syntax, agreement, vocabulary choice and other aspects of grammar that are evident even when the spellings are the same as MSA. For example, بَدِّي أَشْرَب قَهْوَة and بَحِبّ بُيُوت صُغَار, in which every word is spelled the same as in MSA, yet the sentences still differ from the MSA أُرِيدُ أَنْ أَشْرَبَ قَهْوَةً and أُحِبُّ بُيُوتًا صَغِيرَةً. The question you still haven't answered is: What is wrong with merging Levantine together right now, even if we will later merge all Arabic dialects? --WikiTiki89 19:45, 27 February 2014 (UTC)[reply]
The known differences you listed are comparable with Serbo-Croatian and Albanian variations, some regional English slang. It would make sense to keep dialectal phrasebooks as separate subsections of the Arabic phrasebook but it's a word dictionary. Do you possess Hans Wehr dictionary? It has a lot of dialectal words/regionalisms but examples are only provided for MSA. As I said before, usexes can be used on the MSA entries and specific dialectal words can be labeled/categorised accordingly. There's nothing wrong with merging Levantine now, I have already expressed my support for this. --Anatoli (обсудить/вклад) 23:54, 27 February 2014 (UTC)[reply]
Ok, I see your point. No, I don't have a Hans Wehr, but I am thinking about getting one (since I am not a professional translator or anything remotely close to one, buying expensive dictionaries is not something I do for every language I happen to be interested in; in fact the only physical dictionary at all that I personally own is the Even-Shoshan Dictionary of Hebrew; however, due the scarcity of online Arabic dictionaries, and the inconvenience of a PDF dictionary, I am considering buying a Hans Wehr). And thanks for your answer. I'm looking forward to the official merging-all-of-Arabic debate. --WikiTiki89 02:17, 28 February 2014 (UTC)[reply]
(replying to Wikitiki's comment of 23:07, 21 February 2014): That's a good point. I suppose any linguistically sound treatment of Arabic dialects is going to divide them — whether into separate code-having L2s, or simply separate {{qualifier}}-tagged varieties — along lines too different from SIL's for it to be worth retaining ajp and apc no matter what we do. Alright, merge all three codes. (I note for posterity that ajp and apc are only used by a dozen entries each, anyway, and sem-syr is not used at all in the main namespace.)
Now, to address the question of whether the unified "Levantine" should use one of the existing codes or get an exceptional code... precedent, both here (see gcf) and at the ISO (see e.g. their merge of tlw into weo), seems to be to use one of the existing codes. - -sche (discuss) 03:03, 28 February 2014 (UTC)[reply]

Since Wikitiki89 requested my comment, I would say let's remove national-based categorization of dialects and let's separate them by the ISO 639-3 categorization of dialects. It categorizes two Levantine dialects, the northern and the southern. They are close, but separate enough. They are not one and they are not comparable to Boston-New York accents. If you listened to a speaker from Gaza, you can tell the difference in many words and somewhat in grammar from what would a speaker from Lebanon would.

A separate note is that Arabic dialects are hardly mutually intelligible, even when their spoken range is geographically close. They only become intelligible under certain circumstances: 2 people are literate in Literary Arabic which affects both dialects; the same 2 people have accustomed themselves on both dialects by listening to many songs and conversations in the other dialect; when the 2 people try to use a simplified straightforward version of their dialects. --Mahmudmasri (talk) 12:35, 3 March 2014 (UTC)[reply]

Subdialects of Levantine
@Mahmudmasri The reason Gazan Arabic is so different from Lebanese is because, as you can see on the map at right, it is at the very bottom of the continuum. What the map calls "Palestinian" is much closer to Lebanese than Gazan (what the map calls "Outer southern") is. If anything, it should be the "Outer southern" that should be broken off as a separate language, but even that would be a longshot. Anyway, I agree that the only reason many of the dialects are mutually intelligible is because of knowledge of MSA or mutual knowledge of the other's dialect. However, all variants of Levantine Arabic are mutually intelligible with each other. --WikiTiki89 16:40, 3 March 2014 (UTC)[reply]
I'm not sure to what extent the map is precise, however the dialect spoken by urban west Jordanians is not close to the Lebanese as that of northern Israel and West Bank to the Lebanese, even though in both cases the variants are spoken in a very close region. I'm also not sure whether the Gaza speech would be very much mutually intelligible with the Aleppo speech as much as you assume. --Mahmudmasri (talk) 02:10, 4 March 2014 (UTC)[reply]
@Mahmudmasri You have valid points about differences in dialects. It doesn't mean all Arabic dialects can't have the same L2 header. Lack of templates makes harder to add dialectal contexts, e.g. شو (šū) may get an entry:
==Arabic==

===Pronoun===
{{ar-pron|tr=šū}}

# {{context|interrogative|Palestine|lang=ar}} [[what]]?
...and be categorised under Category:Levantine Arabic. Arabic dictionaries, which include dialectal words, have a way of doing it I don't see why we cannot do it as well. Please look at Chinese discussion: Wiktionary:Beer_parlour/2014/March#A_new_format_for_Chinese_entries_.28multisyllables.29, it may give you some ideas. --Anatoli (обсудить/вклад) 03:47, 14 March 2014 (UTC)[reply]
No, please, don't just sum up all dialects under ==Arabic==. You'd better have a section ==[North/South] Levantine Arabic==. The /ʃuː/ is North Levantine, not south, but the Palestinian dialects are not one. The Gaza speech is also considered Palestinian, but they say /ʔeːʃ/. --Mahmudmasri (talk) 18:08, 16 March 2014 (UTC)[reply]
To clarify, and I know this from asking people from various parts of the Levant, North Levantine uses only /ʃuː/, while South Levantine uses both /ʃuː/ and /(ʔ)eːʃ/ interchangeably. --WikiTiki89 18:44, 16 March 2014 (UTC)[reply]
Obviously, because there are no clear boundaries where the dialect area ends. However, I never heard someone from Gaza normally saying /ʃuː/, unless he consciously tries to use the word. --Mahmudmasri (talk) 13:43, 17 March 2014 (UTC)[reply]
Maybe movies are not the best source, but I've watched a Gazan movie a long time ago (before I knew much about Arabic) and I remember hearing /ʃuː/. And just recently I watched a Palestinian movie where the same characters used both /ʃuː/ and /(ʔ)eːʃ/ interchangeably. Additionally, I have asked around Palestinians and Jordanians seem to say that they use whichever one sounds better in the sentence. --WikiTiki89 16:11, 17 March 2014 (UTC)[reply]
I've deleted so-called "Syrian Arabic", judging there to be support above for that. (I repeat my comment that there are 3+ varieties of Arabic spoken in Syria.) Merging the other dialects seems to not have the support of the one native speaker of Arabic who has commented. - -sche (discuss) 02:14, 26 January 2015 (UTC)[reply]


Planning on modifying gender of plural nouns to be either m-p or f-p rather than just p[edit]

@CodeCat, Atitarev, Wikitiki89, Mahmudmasri: Currently, we usually mark the gender of non-human plurals to be just p rather than m-p or f-p. The logic here is that non-human plurals take feminine-singular agreement, so the gender of the singular isn't visible any more. In fact, someone (I forget who) suggested marking such plurals as feminine-singular gender for this same reason, although this seemed too strange to me and others. However, it turns out that it isn't entirely true that the gender of the singular is unimportant. In particular, numbers have separate masculine and feminine forms, and when a number from 3-10 precedes the plural noun, the number is declined according to the gender of the singular, even though the number itself appears in the plural and associated adjectives have feminine-singular agreement. Hence, these nouns really do have inherent gender in the plural, despite the fact that the gender gets overridden by the animacy in most cases. For this reason, I am planning on marking all plurals for gender regardless of animacy. Just a heads up. Benwing (talk) 05:20, 21 June 2015 (UTC)[reply]

Arabic is one of the languages which always have a gender for words, whether the words are for humans or non-living things. --Mahmudmasri (talk) 01:46, 22 June 2015 (UTC)[reply]
No objection to the change, as long as the gender for plurals is not mandatory (simple "p" is still allowed when uncertain). Although there is gender for non-living plural objects, it's not always easy with 100% certainty to determine not just the singular gender but the plurality as well, since plurals of non-living things behave like grammatical feminine singulars or/and they may have singular meaning when translating to English. Such terms can be also pluralia tantum where there may not be a singular form with a gender, as far as I can tell. Some example were given in previous discussions.
There are such cases in other languages as well, e.g. Russian has well-developed grammatical genders but pluralia tantum are not always straightforward, e.g. "месячные" is most likely masculine plural but it would be hard to prove it with certainty. --Anatoli T. (обсудить/вклад) 10:15, 22 June 2015 (UTC)[reply]
@Atitarev I agree, there is no reason not to allow just "p". Benwing (talk) 08:37, 23 June 2015 (UTC)[reply]
The best solution would be to make use of the animacies provided in the gender module. Thus, we would have m pl inan, m pl anim, f pl inan, and f pl anim. We should also mark the animacy for all singular nouns that have plurals. --WikiTiki89 16:06, 22 June 2015 (UTC)[reply]
@Wikitiki89 I think this is a good idea and I will implement it. However, I notice that Module:gender and number has a "personal" category for animacy, and I think it might be better to use "personal" and "non-personal" rather than "animate" vs. "inanimate", since feminine-singular rather than plural agreement occurs with animals. However, to do this properly we need a "non-personal" category for animacy, and for some reason it isn't currently implemented. Can someone add a line like this at line 27 of Module:gender and number (I can't edit it)?
codes["np"] = '<abbr title="non-personal">npers</abbr>'
Benwing (talk) 08:37, 23 June 2015 (UTC)[reply]
Done. I don't really like "npers" as the abbreviation, but I can't think of anything better. Also, are there any plural-only nouns that are countable and thus can take a number? I would like to see an example. --WikiTiki89 16:53, 23 June 2015 (UTC)[reply]
I agree that npers isn't the best; looks rather similar to pers. I considered nonpers but that might be a bit long. In any case, we can always change it later. As for plural-only countable noun s, I find examples with نَاس (nās, people) e.g. on the page [3], the sentence المهدي كان معه أربعة ناس أو خمسة أو ستة أو واحد فالله ناصره i.e. "The Mahdi had four people or five or six (etc.)". Benwing (talk) 09:40, 24 June 2015 (UTC)[reply]
OK, I implemented it in Module:ar-headword but I'm now thinking I'm not so sure it's a good idea after all. The first place I tried to implement it was أُمّ (ʔumm) and plural أُمَّهَات (ʔummahāt). Problem is that the term means not only "mother" but also "origin" and "source", and if we distinguish personal from non-personal then we either have to give this term and its plural a gender of both f-pr and f-np (or f-p-pr and f-p-np for the plural), or we have to split into two entries, one for the "mother" defn and one for "origin" and "source". If we split into two entries we'd want to split the plural into two entries as well, which seems a bit needless to me as the personalness is entirely determinable from the semantics. If we give the entry two genders then we'd potentially need to support up to four genders in the case where a term can be either masculine or feminine. Benwing (talk) 10:06, 24 June 2015 (UTC)[reply]
Here are some examples of how we deal with this for Russian: график (grafik), рыба (ryba), конёк (konjók). It's not very elegant, but it is necessary to convey all the necessary information. --WikiTiki89 14:15, 24 June 2015 (UTC)[reply]
I wonder if animacy is more important for Russian because it directly affects the conjugation rather than just the agreement. Also, are there any exceptions where something semantically inanimate is considered animate or vice-versa? I see the case of "hobby" treated as animate but it looks like the actual meaning is "hobby horse" or something. Benwing (talk) 23:04, 26 June 2015 (UTC)[reply]

Adding pronunciation to Arabic entries[edit]

@Atitarev, Wikitiki89, Mahmudmasri, Stephen G. Brown, Aperiarcam, ZxxZxxZ I am thinking about writing a module, similar to Module:ru-pron, to automatically generate MSA pronunciation from Arabic text and optional translit. I think it should be possible. However, there's a problem in that there's more than one way to pronounce MSA, generally dependent on one's native language. So I'm thinking there should be a "phonemic" version in // that avoids indicating stress or the exact pronunciation of /a/, and then one or more "phonetic" versions that do indicate such nuances. IMO the phonetic versions should be at least a "normative" version and an "Egyptian" version. I'm not exactly sure what should go into the normative version, but I assume it will have at least the following features:

  • Pronunciation of /dʒ/ as [dʒ]
  • Pronunciation of /q/ as [q]
  • The "Classical" stress rule: rightmost heavy syllable is stressed (where "heavy" means long vowel or vowel followed by two consonants, but excludes an absolutely final long vowel); otherwise leftmost syllable, but never more than three syllables from the end; additionally, form VII and VIII verbs are never stressed on the first syllable, with stress shifting one syllable to the right if it would be
  • Unstressed long vowels are not reduced
  • Rule regarding allophones of /a/: ????? (possibly that emphatic [ɑ] is only triggered by an immediately adjacent emphatic consonant, otherwise [æ]; or just leave it as [a] in all cases)

I think that the Egyptian pronunciation of MSA has the following features, but User:Mahmudmasri needs to help; this is largely based on my knowledge of colloquial Egyptian Arabic and extrapolation to MSA pronunciation:

  • Pronunciation of /dʒ/ as [g]
  • Pronunciation of /q/ as [q] (I think? Or as [ʔ]?)
  • Stress following the same rules as colloquial Egyptian Arabic (e.g. madrása rather than mádrasa, qāhíra rather than qā́hira; stress shifts right from a heavy syllable onto a non-final light syllable)
  • Emphasis works as follows:
    • non-emphatic allophone of (short and long) /a/ is [æ]; emphatic allophone is [ɑ]; final -a is not special
    • emphasis, when it occurs, spreads throughout the phonological word
    • emphasis is triggered by ṭ ṣ ḍ ẓ ḷ q but not by ḥ ʕ x ḡ
    • r triggers emphasis if followed by a ā u ū, or if not followed by a vowel and preceded by a ā u ū; but does not trigger emphasis in words like bārid (followed by i) or kabīr (not followed by vowel, preceded by ī); possibly there are additional complications involving r and emphasis
    • when emphasis is triggered, it pharyngealizes all consonants in the word, notably t s d z l r (is this really true?)
  • Unstressed long vowels are shortened (is this true?)
  • Short i and u are pronounced [ɪ ʊ] except absolutely word-finally, where they are [i u] (I imagine User:Mahmudmasri will assert that short i and u are pronounced [e o] when not absolutely word-final, but this is not in accordance with my sources on colloquial Egyptian, nor based on what I hear when listening to it, so I'd like to see an academic source that confirms [e o]; in my hearing, these vowels are lax, not tense, with the i sounding somewhat like English pit; the only time I recall hearing a sound like [e] is before /j/)
  • What about ay and aw? Are they diphthongized or do they become pure [eː] and [oː]?

Unfortunately I have little info on most other pronunciations. I can guess the following features for Moroccan, based on my knowledge of colloquial Moroccan Arabic:

  • Pronunciation of /dʒ/ as [ʒ]
  • Pronunciation of /q/ as [q]
  • There may not be a clearly accented syllable (is this true?)
  • Pronunciation of non-emphatic /t/ as affricated [tˢ]
  • Limited emphasis-spreading, maybe only as far as the first long vowel in either direction
  • Emphasis affects not only /a/ (which becomes [æ] vs. [ɑ]) but also short and long /u/ and /i/, which lower to [o] and [e] (or perhaps [ɔ] and [ɛ]) when emphatic
  • Final -a (at least in the feminine ending) might have a special pronunciation (e.g. always [ɑ] or [ɐ])
  • Unstressed long vowels might not be shortened, except word-finally

For Levantine Arabic, I can guess, with little certainty:

  • Pronunciation of /dʒ/ as [ʒ]
  • Pronunciation of /q/ as ??? (either [q] or [ʔ])
  • Stress is similar to the "Classical" stress rule (mádrasa, qā́hira)
  • Emphasis spreading is blocked by /j/ and /ʃ/

Any comments/help would be greatly appreciated. Benwing (talk) 09:52, 7 August 2015 (UTC)[reply]

Wouldn't it be too hard trying to cater for various dialects? Even the MSA pronunciation is not fully documented but it's the most regular, dialects are too unpredictable and spelling is haywire. It would be good to set the most common word stress as well, based on the form without ʾiʿrāb, at least where it's known and uncontroversial. --Anatoli T. (обсудить/вклад) 11:29, 7 August 2015 (UTC)[reply]
The problem is that there isn't a single MSA pronunciation, but it varies depending on the native speaker's dialect. I'm not trying to cater for the colloquial speech but rather the way that MSA is pronounced, and unfortunately that varies. So an Egyptian when speaking MSA will put the stress as madrása, a Syrian as mádrasa, etc. If I can get just a couple MSA pronunciations (e.g. Egyptian and Levantine, or Egyptian and Moroccan, or Egyptian/Levantine/Moroccan) that would be enough probably. Benwing (talk) 11:37, 7 August 2015 (UTC)[reply]
Take a look at this article for the rules of stress in "generic MSA". To answer some of your uncertainties:
  • Egyptian and Levantine MSA both pronounce /q/ as [q].
  • Stress in Levantine MSA differs from the "generic MSA" in some cases. For example, with ʾiʿrāb, مدرسة is stressed mádrasatun, while in the "generic MSA", the stress would shift forward to madrásatun.
  • In most varieties of MSA, unstressed long vowels are shortened to some degree. In some but not all variants (not sure exactly which), they become as short as the short vowels, but even in the others the shortening is noticeable. We could call them half-long and maybe use the IPA symbol [aˑ] (not sure if that's the correct use of this symbol).
  • It is true that Moroccan collquial varieties does not have phonemic stress (I guess being influenced by Berber languages) and I think that more or less carries over to their MSA.
  • An interesting thing I read about emphasis spreading in Moroccan colloquial dialects is that the consonants are affected (like you mentioned for Egyptian), so that /s/ and /sˤ/ are indistinguishable, but despite that, /t/ and /tˤ/ are still distinguishable only because the former still affricates.
This task is not going to be easy. --WikiTiki89 14:53, 7 August 2015 (UTC)[reply]
@Wikitiki89 Thanks. I've definitely attacked hard problems before -- Module:ar-verb is running at 3,943 lines and Module:ar-nominals is 2,655 lines (but probably more complex than Module:ar-verb despite this). Compare Module:ru-pron at 435 lines; I can't imagine a simple version that, say, just does Egyptian and "generic MSA" would be much longer. But I need to gather more info before attacking this problem; hopefully User:Mahmudmasri will help. Benwing (talk) 16:45, 7 August 2015 (UTC)[reply]

Too much to comment on. I may elaborate more later. There is no classical stress. Are there any sources describing the stress in Classical Arabic? If so, it's still unimportant since the stress is the most trait affected by native dialects. The short vowels have no solid standard: when pronouncing MSA they tend to be less likely [e, o], but even in spoken Egyptian Arabic, all are possible [e o, ɪ ʊ, i u].   [ɪ ʊ] are the more usual vowels in words which have another stressed syllable with a high vowel. Think of it similarly with the case for [æ~ɑ], for example: Egyptian Arabic 'give me': [ɪdˈdiːni, idˈdiːni], 'watch him/it' :[ʃʊˈfuː(h), ʃuˈfuː(h)]; MSA-E 'by it/him': [ˈbihi] only. --Mahmudmasri (talk) 02:34, 8 August 2015 (UTC)[reply]

@Mahmudmasri Thanks for your comments. Anything you can add about e.g. Moroccan or Levantine pronunciation of MSA would be great. How exactly does stress and emphasis spreading in Levantine MSA work for example? Benwing (talk) 05:25, 9 August 2015 (UTC)[reply]

This is a great idea, though it seems like a beast of a project to take on. A few comments/ questions (all on Egyptian):

  1. How would you handle nisbas? Technically in pause-form MSA they are supposed to be stressed on the ultima, since the yaa is consonantal, but my experience is they are generally stressed wherever the word from which they are derived is stressed.
  2. /q/ is pronounced in Egyptian MSA, but sometimes I do hear it dropped (e.g. تقريبا ~ tæʔ'rībæn)
  3. ay -> /e:/ fairly consistently.
  4. aw may be retained in very careful speech, but I think it generally goes to /o:/
  5. non-emphatic /a:/ goes to /æ:/, but very frequently I have heard it shift all the way to /e:/ or even /eɪ/; not sure if there is any rule regarding this or if it's just a variant among speakers.
  6. the colloquial Egyptian shift of /ṯ/ -> /s/ often happens in MSA as well.

I think the real problem is we can't draw a line. "Egyptian MSA" is MSA with Egyptian dialect sneaking in. It's hard to prescribe a pronunciation when the borderline between "Egyptian" and "Egyptian MSA" is so fluid. Egyptian grammatical constructions may also find their way into speech that is meant to be MSA. Aperiarcam (talk) 00:02, 10 August 2015 (UTC)[reply]

I'm willing to pick a pronunciation that represents a typical formal pronunciation rather than trying to account for all the variants. In reality there's a continuum between pure MSA and pure dialect rather than discrete entities, but it's still possible to identify a fairly pure MSA -- e.g. how would a newscaster reading from a teleprompter typically pronounce things? As for nisba, we already treat it in the "informal" form as if it ended in unstressed rather than stressed -iyy. Benwing (talk) 01:05, 10 August 2015 (UTC)[reply]
Think of something like the "newscaster" accent in General American, which unifies what are in reality many discrete pronunciations -- or RP in England. Benwing (talk) 01:07, 10 August 2015 (UTC)[reply]

Numerals[edit]

I created a table of numerals from 1-10 at Appendix:Arabic numerals (which till now was a redirect to Appendix:Roman numerals). I am not sure what the numerals like رُبَاع (rubāʕ) are actually called; I wrote "multiplicative" as a placeholder. There are a lot of numerals to add, and the page should also explain syntactic characteristics of numeral-containing phrases. — Eru·tuon 22:34, 23 December 2016 (UTC)[reply]

category 'terms with common plural'، 'Usage notes', etc.[edit]

Hi, it would enrich the dictionary to add the categories of 'terms with common plural', as is كِبَار or 'terms with usage notes' as here. Thank you in advance. --Backinstadiums (talk) 17:32, 4 February 2017 (UTC)[reply]

'broken plural triptote in ـٍ ‎(-in)‎'[edit]

(moved from Talk:يد) Hi, its declension shows 'broken plural triptote in ـٍ ‎(-in)‎' which is completely wrong. I'd like to know what alternative name can be used, for it's not a triptote. Thanks in advance. --Backinstadiums (talk) 15:08, 6 February 2017 (UTC)[reply]

@Benwing2: He's right the declension classification is misleading. Maybe we should use the words "nunated" and "non-nunated" or something like that. Also, are you sure it's أَيَادٍ (ʔayādin) and not أَيَادِي (ʔayādī) in the indefinite nominative? And are you sure it's not أَيَادِيَ (ʔayādiya) in the indefinite genitive? --WikiTiki89 16:11, 6 February 2017 (UTC)[reply]
@Backinstadiums, Wikitiki89 Wikitiki: Did you mean something else in your second question? أَيَادِيَ (ʔayādiya) is what's currently shown. I'm pretty sure the forms أَيَادٍ (ʔayādin) and أَيَادِيَ (ʔayādiya) are both correct; I remember looking these up in my grammar book. The terms "triptote" and "diptote" here are used historically; before various sound changes and analogical changes, these were indeed triptotes and diptotes respectively. I'm fine with using other terms, though. "Nunated" and "non-nunated" are potentially misleading as well because the "non-nunated" forms are nunated in the nominative and genitive. Benwing2 (talk) 02:20, 7 February 2017 (UTC)[reply]

Convenience template for entering Arabic[edit]

I created the template {{subst:chars}} some time ago, which uses Module:typing-aids to convert various ASCII shortcuts to Unicode characters. It includes a set of shortcuts for Arabic: for instance, you can type {{subst:chars|ar|kitaabuN}} to get كِتَابٌ. It can also transform into a linking template: you can type {{subst:chars|m|ar|kitaab}} to get كِتَاب (kitāb). Not sure if anyone will find this useful (I myself use the Windows Syrian keyboard when typing Arabic), but thought I would let people know. If you have any suggestions for improvements, post them on Module talk:typing-aids. (It occurs to me I need to add a shortcut for alif wasl.) — Eru·tuon 08:22, 23 February 2017 (UTC)[reply]

Oh, that's useful. Thanks! --Anatoli T. (обсудить/вклад) 11:45, 23 February 2017 (UTC)[reply]
Definitely useful. Thanks again. Benwing2 (talk) 07:09, 24 February 2017 (UTC)[reply]

Patterns[edit]

I started a conversation in the Beer parlour about documenting Arabic patterns. @Wikitiki89 thinks it would be best to put them in the Appendix namespace, since not all patterns have a clear meaning, and dividing patterns between main and Appendix namespace would be messy.

So recently I started Appendix:Arabic patterns. The first pattern entries I created have no diacritics in their titles, like Arabic entries in the main namespace. But that is inconsistent with Appendix:Hebrew patterns, where the diacritics are included in the titles. I think I will recreate the pages, this time with diacritics, unless there is disagreement on that point. — Eru·tuon 07:07, 10 March 2017 (UTC)[reply]

@Erutuon: 'Testcases' seem o.k. What is the next step? I think the easiest implementation would be a user-friendly interface where different morphological parameters can be selected to create customized lists of terms, instead of pre-arranged permanent ones. --Backinstadiums (talk) 13:18, 16 April 2017 (UTC)[reply]

Tanwin question[edit]

I'm confused about "do not use the spellings ـاً or ـىً with the diacritic over the last letter". It seems to be a Wikipedia standard to write مرحبًا instead of مرحباً but I can't seem to find the justification for this and a native Arabic speaker thinks it's wrong. I'm not proposing a change just wondering what the reasoning is behind this. If someone could provide a reference I will add it to the page. Radixcc (talk) 03:09, 7 July 2017 (UTC)[reply]

I'm pretty sure most fully vocalized texts do it our way, while the other way is more common in texts that are mostly unvocalized. So while مرحباً is more common than مرحبًا, I'm pretty sure that مَرْحَبًا is more common than مَرْحَباً. Since we write fully vocalized, we put the fathataan on its consonant rather than on the final alif. However, in direct quotations of other texts, we should copy however it appears in the original. --WikiTiki89 03:29, 7 July 2017 (UTC)[reply]
Arabic nunations (-un, -an, -in) are considered vowels just like fatḥah, kasrah, and ḍammah, and they can only be carried by consonants, never vowels of prolongation. Persian, on the other hand, does put nunation over the final letter, even if that's an alif.
Arabic nunation is carried by the final consonant, so: كبيرةٌ, ملكةٌ, أختٌ, بنتٌ, أمٌ, عجوزٌ, أتانٌ, أيهًا, قمرًا, كساءً, etc.
Looking through my old books, I found this explanation:
“Nunation—In one class of nouns the final vowel, which is the case ending, is written twice to indicate the pronunciation un, an, in respectively. With ‘u’ the upper sign is usually reversed, or ٌ (un) is used instead.”
So nunation is a doubling of the final vowel, and the final vowel is borne by the final consonant, never by alif of prolongation. —Stephen (Talk) 22:35, 25 July 2017 (UTC)[reply]
@Stephen G. Brown: Except that that's not true in reality. In modern Arabic non-fully-vocalized printed texts and handwriting, when the fatḥatān is written, it is usually written on the final alif, e.g. دائماً. --WikiTiki89 23:06, 25 July 2017 (UTC)[reply]
As far as I'm concerned, it is a misspelling. It came about because, until only about fifteen years ago or so, almost no Arab could type Arabic, and everybody wrote Arabic in longhand. To add diacritics to handwritten words, the word is written out in full, then the diacritics are added last, and the position of the handwritten diacritics is not critical. They can be place over, before, or after a letter. This meant that few Arabs knew the rules, and only a few professionally trained calligraphers and typesetters knew the rules. So today, now that almost every Arab can type Arabic, they still type the nunation last just as they did in handwriting, except that now they are making an error. If anyone wishes to consider the typing of these self-taught beginner typists correct, it's okay with me. I consider it incorrect. —Stephen (Talk) 23:53, 25 July 2017 (UTC)[reply]
You of all people should understand that we work based off of attestation and not prescriptive rules. Calling it a misspelling only goes as far as to explain its origin, but nowadays it is the most prevalent form. --WikiTiki89 23:59, 25 July 2017 (UTC)[reply]

Where are the hamza seat rules?[edit]

I'm trying to solve this problem I have finding clear hamza seat rules because there seems to be disagreement on what the rules are depending on source. Arabic Wikipedia has this [4] but it seems to be lacking references. Does wiktionary or Wikipedia have a standard for this? I also want to update Wikipedia:Hamza to match whatever the standard is here. — Radixcc 📞 20:54, 25 July 2017 (UTC)[reply]

We include whichever forms are attestable. That way, we don't have to "decide" what the rules are. In other words, if multiple hamza seats are possible to spell a particular word, we include all attested possibilities. --WikiTiki89 20:56, 25 July 2017 (UTC)[reply]
Here are the hamza rules that I learned long ago:
“Hamza—The sign ء (ʔ) is usually written with one of the three consonants alif, w, or y, which is called its bearer. y, when written with hamza, always loses its dots. Hamza always has alif at the beginning of a word and, after the vowel ‘a’, at the end.
“After a long vowel it has no bearer except in the sequence ɑːʔi when it usually has ‘y’: أَبْنَائِهِ (ʔabnāʔihi)
“After sukun it may be written over a line connecting two letters: مَسْـُٔـولٌ
“To find out how to write hamza—Pronounce the word as if the hamza were not there, write the result, and add hamza. Take the word fuʔɑːdun. Without hamza it becomes fuwɑːdun, which is the correct way to write it, فُوَادٌ (fuwādun), then add hamza فُؤَادٌ (fuʔādun). The plural of this word is ʔɑfʔidɑtun; without hamza it becomes ʔɑfiːdɑtun. This is أَفِيدَةٌ (ʔafīdatun), remove the dots from the ‘y’ and add hamza, and remove ‘i’ one step to the left, أَفْئِدَةٌ (ʔafʔidatun) is the right spelling.”
The above rules are taken from my old book. —Stephen (Talk) 23:37, 25 July 2017 (UTC)[reply]
But again, those are the Classical rules, not the modern rules. In modern typography and handwriting, the hamza never appears over a connecting line, and such cases are modified in various ways, depending on the word. --WikiTiki89 00:03, 26 July 2017 (UTC)[reply]
Yes, you already said that. No need to keep repeating it, I already know what you think. I do not accept it, as I explained above, and no need to repeat myself since it's all laid out above. —Stephen (Talk) 00:20, 26 July 2017 (UTC)[reply]
Ok, the best modern rules I can find are in A Reference Grammar of Modern Standard Arabic. They seem to be matching up the best with dictionary spellings. I'll pass on the wiktionary policy to anyone who asks. — Radixcc 📞 01:30, 26 July 2017 (UTC)[reply]

Arabic dialects: pronunciation and more[edit]

For those who are interested, I have started a discussion at User talk:Wyang#Arabic on Wiktionary for how we can potentially copy the success of Chinese infrastructure for Arabic. (Don't worry, I am not proposing a merger of the Arabic codes at this time, nor that we try to provide pronunciation for the various regional flavours of MSA.) I have also created WT:About Arabic/Egyptian, which could use attention from a fluent speaker. —Μετάknowledgediscuss/deeds 05:09, 10 September 2017 (UTC)[reply]

Such sentences as "From the root" should be changed to "Morphologically from the root", because Wiktionary:ETY#Surface etymologies requires surface etymologies (as opposed to historical developments) to be presented as such. The word "Morphologically" is used in the entry astrology referred to as an example in the section linked above.

92.184.96.107 11:54, 16 September 2017 (UTC)[reply]

Reference templates[edit]

@Palaestrator verborum I think you should consider moving the description of dictionaries/reference templates to a dedicated page, and only keep the most important ones here. --Per utramque cavernam (talk) 13:34, 13 December 2017 (UTC)[reply]

@Per utramque cavernam I also thought so, but I have not the slightest notion of where to put it – which title the page would have and what kind of page it would be. I know of no guiding model. Palaestrator verborum (loquier) 13:43, 13 December 2017 (UTC)[reply]
It's a Wiktionary-specific topic, so the page should be in the Wiktionary namespace. How about titling it something like "Arabic references" or "References used for Arabic"? — Eru·tuon 20:46, 13 December 2017 (UTC)[reply]
Sounds good: Wiktionary:About Arabic/Arabic references? Wiktionary:About Arabic/References used for Arabic is likewise good, short of being longer without conveying more.
Can it be made better by being converted into a table? You surely know better how to do this, @Erutuon. The text I have written around the references isn’t even that important. Palaestrator verborum (loquier) 05:00, 14 December 2017 (UTC)[reply]
Wiktionary:Reference templates has to be considered. Something has to be done in the course of cleaning up Category:Reference templates linking such listing pages closer to Category:Reference templates by language. @Erutuon Palaestrator verborum (loquier) 16:11, 15 December 2017 (UTC)[reply]
@Palaestrator verborum: Only sometimes do I have ideas for better presentation. I think the text describing the reference works might be helpful; I would suggest leaving it because you've put work into it. In this case it is a little hard to see at a glance which text pertains to which reference works. But I don't know if a table can fit the text you've written. — Eru·tuon 07:56, 17 December 2017 (UTC)[reply]

Duals in headword line[edit]

@Atitarev, Benwing2, Fay Freak: 95.187.111.170 has been adding dual forms to noun headword lines after 151.255.51.127 (maybe the same person) made the necessary changes to Module:headword. Is this desirable? I assumed they were omitted from headwords before because they are generally predictable. — Eru·tuon 00:37, 31 January 2019 (UTC)[reply]

@Erutuon Duals are entirely predictable so there's no point in including them. I'd personally revert all the changes. Benwing2 (talk) 02:29, 31 January 2019 (UTC)[reply]
Sometimes perhaps one could ask if the dual is in yāni or in wāni or ʾāni (or with added sylable as in (أَخَوَات (ʔaḵawāt) for أَخ (ʔaḵ)) – usually wāni, as Classical Arabic grammars tell you, which by the way is not put correctly by the module for feminine duals of color/defective adjectives in the declension tables so one has to specify the feminine dual in words like أَدْهَم (ʔadham) in Category:Arabic color/defect adjectives without need. For consonantal endings it is just easily āni. @Benwing2 Fay Freak (talk) 07:56, 31 January 2019 (UTC)[reply]
My preference is for inflections to be either in the headword line or in the inflection section, but not both. Since it's now possible for inflection tables to display a minimal number of forms even in their collapsed state (like on guolli, kala), showing them in the headword line isn't necessary anymore. —Rua (mew) 14:16, 9 March 2019 (UTC)[reply]

Should multiple-word terms be categorized as from roots?[edit]

I see 95.187.62.161 (talk), our old Meccan acquaintance that rarely utters an English word, adding for example Category:Arabic terms belonging to the root و ح د to الْوِلَايَات الْمُتَّحِدَة الْأَمْرِيكِيَّة (al-wilāyāt al-muttaḥida(t) al-ʔamrīkiyya, United States of America), or Category:Arabic terms belonging to the root و ل ي and Category:Arabic terms belonging to the root و ح د to وَاجِهَة بَرْمَجَة تَطْبِيقَات (wājiha(t) barmaja(t) taṭbīqāt). Is this desirable? I don’t think so. Seems to me one of these quirks one gets when one hasn’t got the prowess to make any significant additions to the dictionary. Fay Freak (talk) 14:03, 9 March 2019 (UTC)[reply]

I'm against it, for what it's worth. Enough to categorize the entries for the constituent words. — Eru·tuon 19:53, 16 August 2019 (UTC)[reply]

Lane’s Lexicon[edit]

March 27, 2019: Text Version of Lane’s Lexicon is going through active proofreading with the help of Alpheios project. All changes are being tracked through GItHub and can be viewed at: https://github.com/naveedulislam/lan ; https://lanelexicon.com/updates/

--Backinstadiums (talk) 14:57, 15 August 2019 (UTC)[reply]

Bad transliteration[edit]

If anyone wants to fix some transliterations, see the list here: User:Erutuon/bad Arabic transliteration. — Eru·tuon 19:54, 16 August 2019 (UTC)[reply]

Errors in the automated transliteration[edit]

There are a few errors in the automated Arabic transliteration system. Particularly, the suffix ـِيّ is transliterated as “iyy”, when it should be “īy”. For example, the entry for يهود lists “yahūdiyy” for the singulative form يَهُودِيّ‎, which is obviously erroneous and should be “yahūdīy”. The name عَلِيّ is transliterated as “ʿaliyy” instead of “ʿalīy”. the word إِسْلَامِيّة is pronounced as ʾislāmīyah, not ʾislāmiyyah. The suffix ـِيّ is inherently contracted, and when spelled out is ـِييْ, which again is pronounced “īy”. The first of the two Yāʾs is not sākin (i.e. does not have a sukūn). The second Yā ʾ is only sākin when pronounced in pausa, otherwise the sukūn is replaced with a vowel diacritic marking its grammatical case. The current transliteration pronounces it as if it were ـِيْيْ, with both Yāʾs sākin. The pronunciations of “iyy” (as in ʾislāmiyyah) or “ī” (as in is yahūdī) are purely colloquial.

The final Tāʾ Marbūtah is also pronounced in formal Arabic as “h” and is not silent. For example, the words Surah (سورة) and Ayah (آيَة). It is only silent in colloquial speech, just like the Yāʾ mentioned above (as in yahūdī instead of yahūdīy). The entire reason a Tāʾ Marbūtah is written as a hāʾ with two dots above it is because it is pronounced as a hāʾ unless diacritics determine it is pronounced as a regular Tāʾ. Similarly, a final Hamzah (ء) is also silent in colloquial speech, such as دُعَاء (duʿāʾ) being pronounced erroneously as دُعَا (duʿā) in colloquial speech, or كَرْبَلَاء (karabalāʾ) being pronounced as كَرْبَلَا (karabalā).

There is also a lack of proper support for Classical Arabic spellings, such as for ʾAlif al-Waṣl (ٱ) and ـَوٰ (ā). For example, the definite article ٱلـ is transliterated as “l-“ instead of “al-“, word صَلَوٰة (ṣalāh) is erroneously transliterated as ”ṣalawā”, the word زَكَوٰة‎ (zakāh) is transliterated as ”zakawā“, and the word حَيَوٰة‎ (ḥayāh) is transliterated as ”ḥayawā“. — LissanX (talk) 22:15, 11 October 2019 (UTC), last updated 23:10, 11 October 2019 (UTC)[reply]

The computer can’t know that صَلَوٰة isn’t ṣalawā. Also these cases are rare and also non-existent in Modern Standard Arabic. It’s just you who focusses on it.
tāʾ marbūṭa is silent in many dialects. This transcription has become widespread in scholarship because of Egyptian use. Of course they are pronounced if full classical endings are used. What is pronounced else is not wholly relevant for the transcription. We know what is meant if we read such spellings; if however (h) is used then it looks like a root letter, therefore I am against it.
iyy is also a choice. Easier for the computer: The computer thinks that ي + shadda = a geminate, yy. And also a transliteration. Though I agree that it is better to have -īy andـوّ ؜  to be -ūw etc.Fay Freak (talk) 00:15, 12 October 2019 (UTC)[reply]
I don’t know much about the how system is programmed, but couldn’t the coding algorithm be tweaked so that ‘ā’ is only transliterated when it is either a Fathah followed by a plain ʾAlif (i.e. ـَا) or a letter with a Fathah and ʾAlif Khanjarīyah together (i.e. ـَٰ), so that a letter that has only a an ʾAlif Khanjarīyah alone (i.e. وٰ) is transliterated as the letter becoming silent and only the vowel ‘ā’ being displayed, as opposed to the letter not being silent? — LissanX (talk) 16:26, 14 October 2019 (UTC)[reply]
@LissanX: That would require updating many pages and imposing a new policy – there are many cases currently in various Wiktionary pages where dagger alif alone on a consonant is meant to indicate a consonant and vowel, and the transliteration module has worked off that assumption. — Eru·tuon 17:05, 14 October 2019 (UTC)[reply]

Dialects: historic word-final long vowels[edit]

Is there any opposition to the idea that, for Arabic dialects, word-final ا و ي ى should not be transliterated using macrons...? I'm not sure about e.g. Najdi Arabic or Bedouin varieties, but none of Hijazi/Gulf/Levantine/Egyptian preserve these as long vowels. Ever. Either phonemically or phonetically.

For example, compare Levantine كتبوا (katabu < *katabū) to كتبوه (katabū < *katabūh), and note that transcribing the former as katabū (in blind compliance with diachronics) instead indicates the latter word... same with anything else that was historically long-vowel-final. This bugs me, but I'm not sure if it'd be OK to just start 'fixing' examples of this on sight. M. I. Wright (talk) 07:59, 25 October 2019 (UTC)[reply]

@M. I. Wright, Fay Freak: Don't change them manually. They are phonemically long and macrons are automated but it would be OK to make Module:ar-pronunciation to make them pronounced short. E.g. عَلَى (ʕalā) is transliterated as "ʿalā" but the IPA should be /ʕa.la/ (without a colon). I just noticed that أَنَا (ʔanā) has two transliterations in the entry: ʾana and ʾanā, as if it only applies to this word. Macrons indicate that a long vowel is written, even if they are pronounced short. --Anatoli T. (обсудить/вклад) 12:21, 25 October 2019 (UTC)[reply]
My comment actually refers to (standard) Arabic (MSA) entries. I hardly edit dialects and not so concerned personally if you remove final macrons there. --Anatoli T. (обсудить/вклад) 12:25, 25 October 2019 (UTC)[reply]
Yeah, no question that the macron ought to be there for transliterations of MSA! In the contemporary dialects, historic word-final short vowels are gone and historic word-final long vowels are shortened in their place (see above), so ا و ي etc. are phonemically short at the end of a word and all, and that's the source of my annoyance here. Good to know that fixing examples of this isn't going to mess anything up. (If it matters, note that that's with the exception of a few single-syllable words like شي شو ما مو جا هي هو, all present in various dialects, which are free to alternate between long/short when pronounced.)
Would be nice if dialect-transliteration were automated too, wouldn't it? I have a draft of a Levantine IPA-generator which I intend to expand (and, hopefully, eventually use to replace the manually-entered transliterations we've currently got), but I have little to no knowledge of other varieties' phonologies. Sucks. M. I. Wright (talk) 04:55, 26 October 2019 (UTC)[reply]
If you make a transliteration module for a dialect, it can only be sourced from transliterations, not from the Arabic script, which is occasionally employed for MSA when there are missing native symbols, e.g. at بلجيكا. The current module Module:ar-pronunciation could be expanded or copied to cater for dialects with some modifications. Check at WT:GP, maybe someone will suggest how to do it. --Anatoli T. (обсудить/вклад) 05:16, 26 October 2019 (UTC)[reply]
This is Wiktionary, where Arabic transliterations are meant to be inaccurate. I recommend transliterating your edits correctly the way they actually sound until some degenerate comes and undoes it to a standard, incorrect version which exist nowhere except this website. — LissanX (talk) 01:58, 26 October 2019 (UTC)[reply]
So, your insulting tone is supposed to help your case somehow? Most dictionaries don't care about Arabic transliterations at all and if they do, it matches what we do here, which we do consistently and the methods are described. --Anatoli T. (обсудить/вклад) 03:47, 26 October 2019 (UTC)[reply]

What is a ظرف called in English?[edit]

That is, what is it to be called when discussing Arabic in English? It seems that ظروف are both prepositions and adverbs if we slot them into English-traditional classifications — and indeed, the page for ظرف refers to it as an "adverb", but an example like بَعْدَ (baʕda) is defined under "Preposition". Is there a single term we can use for both? And may the specific subtypes (ظرف زمان ظرف مكان) be translated as "temporal" and "spatial"? M. I. Wright (talk) 03:07, 18 November 2019 (UTC)[reply]

@M. I. Wright: If I'm not mistaken ظَرْف (ẓarf) has the meaning "a point in time", as in ظَرْفُ الزَّمَانِ (ẓarfu z-zamāni), it's not the actual adverb. Just like the English word "adverb" is actually a noun. An example of such an adverb would be لَيْلًا (laylan, at night). --Anatoli T. (обсудить/вклад) 03:59, 18 November 2019 (UTC)[reply]
You could use "adverbials of time" when describing such adverbs in English formed from nouns in the accusative indefinite form. --Anatoli T. (обсудить/вклад) 04:02, 18 November 2019 (UTC)[reply]

Definite forms[edit]

Hello, would it make any sense to have either a separate page/entry or a redirection for definite nouns (either manually or automatically)? For example َََالبيت . This is sheer laziness on my part, when I use a link to the Wiktionary from apps such as LingQ or ClozeMaster, I then have to tinker with the article in my search. I feel that having functional links to the definite forms would made consulting the dictionary more efficient. Cyril.vereb (talk) 14:18, 27 December 2020 (UTC)[reply]

No Arabic dictionary does this, because anyone with even the most rudimentary introduction to Arabic will be able to identify the definite article. There are several other particles that attach to the following word, and it doesn't make sense for us to include all of them. Some basic knowledge of a language will always be necessary in order to use a dictionary (just as if one sees something written in Arabic, it is necessary to know Arabic script in order to look up what the words mean). —Μετάknowledgediscuss/deeds 06:34, 28 December 2020 (UTC)[reply]

What is the transliteration standard used by Wiktionary?[edit]

I've tried to figure it out but I can't seem to find if the transliteration guidelines here match some standard or if it's Wiktionarys own invention. If it's an invention by and for Wiktionary what is the rationale for not using any of the multiple standards that already exists? Dgse87 (talk) 20:08, 9 May 2022 (UTC)[reply]

@Dgse87: You are constructing a false opposition. Even if Wiktionary employs an “own invention”, it is eclectic and positively combines aspects of multiple systems, and maybe the neutrality to the best of them is the very rationale for the mix. Note that since Wiktionary covers all languages, the system for Arabic tries to have some compatibility with other Semitic tongues and not confuse readers jumping between different languages they look up. This may be one reason for the use of instead of , as well as instead of ġ for Northwest Semitic begedkefet languages, giving some symmetry between stops and fricatives. j is supposedly preferred by English speakers and therefore we do not use ǧ, and otherwise we have exactly DMG transcription as well as those generally used in Arabic philology and Semitist works.
I don’t remember the original reasoning, I got habituated to it from my start half a decade ago (now having increased our Arabic coverage by the size of a smaller complete dictionary, almost always with automatic transcription), but overall the system is not fanciful and rather utterly traditional.
For Proto-Semitic there is less tradition and more recent findings than for the individual languages, as well as more potential controversy of its soothfast gestalt, and it has to target a more specialist audience, hence, and because of it having a superset of phonemes, the differences.
Ethiopian Semitic has another tradition, with relatively little deviation ever contrived. You see how one could become confused when considering all … Fay Freak (talk) 00:15, 10 May 2022 (UTC)[reply]
@Fay Freak: It seems a long answer for a rather vague question. The only meaning from the complain is, why we are not using multiple transliterations. It is quite obviously based on Hans Wehr dictionary with more accurate handling of hamza and adjectival endings. --Anatoli T. (обсудить/вклад) 00:24, 10 May 2022 (UTC)[reply]
@Atitarev: You use to stress Hans Wehr, but his dictionary has employed what was the standard in the field at his time, and still is, and got codified in DIN 31635 – but like many reasonable de-facto-standards have become DIN norms; me knowing the de-facto-standards of course implies I have been down the philology rabbithole more than the casual observer of Arabic is. On the other hand I got the sixth edition in my hands which has been typeset anew and due to technological developments in the last forty now has more distinctions in the Arabic script and transcriptions, writing أن and إن (ʔin) and transcribing the half rings – we have been ahead of time. Recension of the 2020 reedition (it will probably last for our lifetimes). Fay Freak (talk) 00:48, 10 May 2022 (UTC)[reply]
@Fay Freak: Thanks for the link to the standard. Yes, I knew it was based on an existing standard (which the dictionary has adopted) but didn't bother looking. At the time of choosing the standard, HW was a good example where the standard was actually heavily used and we could always refer and quote. Good to see that the improvements in the newer versions! I wish the dialect editors follow the same standards we have for MSA but as you know they don't always do.
I would oppose using multiple transliteration together, unless a module can automate and display somewhere inside the pronunciation section but not in translations, "derived terms", etc. --Anatoli T. (обсудить/вклад) 00:58, 10 May 2022 (UTC)[reply]
I work on Mandarin transliteration schemes all the time. There is one major scheme and four minor ones, followed by the remnants of pre-modern transliteration schemes which are in a gray zone as to where they came from. Do Arabic transliteration schemes have this kind of situation, or are all the transliterations mixed in together and their origins unclear? --Geographyinitiative (talk) 01:02, 10 May 2022 (UTC)[reply]
@Geographyinitiative: Since there are only three vowels plus length distinction and no tones (like in Proto-Semitic), there is no wiggle room. The questions are largely about how to represent by a single graph, instead of di-, tri- or tetragraph. ج (j) /d͡ʒ/ from Proto-Semitic g you would of course rather not have dsch like that sound is vulgarly transcribed into German. When Oriental philology became stricter, in the 19th century, one took over the convenient Czech/Slovene/Slovene/Serbo-Croatian like orthography, which one could typeset in Austra-Hungarian print-offices and perhaps even type on Austro-Hungarian typewriters, although the leading Orientalist journals of course typeset Arabic, Hebrew and Syriac letters (it was more difficult for the cuneiform of the Akkadian language discovered then). And one had to assign additional diacritic marks for phonemes or distinctions not occurring in European languages, hence a few differences. Those rings ʿ and ˒ are the most annoying parts since these small hooks do not correspond to the relevance of there being full phonemes behind them; dialect editors on Wiktionary have decided to use ʕ and ʔ to be make them at least more conspicuous. But you see that in the end the choices are always similar, just with varying minor trade-offs due to considering what restriction and similarity exactly you are catering to. Now of course vulgar transcriptions are much more manifold, like disregarding lacking phonemic status people write o and e and employ pleongraphs (like few study Slavistics before writing the name of the Ukrainian president), but there is one scholarly system contrasting with Sinologist ex nihilo systems more arbitrarily assigning letters and diacritics for regularly essential parts of the languages. OP’s “multiple standards” is cap, as if just formulated to be more inflammatory than the situation is. (Unclear, or too hard to resolve, origins hardly exist in a subject which recently fielded in books, and we may explain its fourfold reasons; but maybe you were also asking about the vulgar transcriptions as seen in Europeans writing toponyms, which I have elucidated as well.) Fay Freak (talk) 01:55, 10 May 2022 (UTC)[reply]
Sorry if I came across as rude I was just genuinely curious. I'm not claiming any of the different other standards out there are better and I'm in no way competent enough to even have a educated opinion on the matter I just found it interesting that the standard used here on Wiktionary didn't match my textbook nor any of the other DIN or ISO standards listed on this wiki-page (Romanization of Arabic) and wanted to see if there was a reason for this. Your answer gives several good points as to why the system used here exists, thank you :) Dgse87 (talk) 08:20, 10 May 2022 (UTC)[reply]
@Dgse87: Transliterations in textbooks are there so you just can become comfortable. I have various textbooks and dictionaries, which use various transliterations or just completely rely on native Arabic vocalisations. The more standard or high level the textbooks are, the more likely they will use something similar to DIN 31635 (but they don't have to). It does not matter so much if e.g. خ is rendered as "ḵ", "x", "ẖ", "ḫ", "kh" or "5" for learning purposes, as long as your source is consistent with it. The (semi-)automated transliteration at Wiktionary makes sure that it's consistent. In cases where automated transliteration fails (some corner cases, missing vocalisation, irregular pronunciations or dialects) a standard, preferred methods should be used as in WT:AAR. --Anatoli T. (обсудить/вклад) 09:11, 10 May 2022 (UTC)[reply]

Change proposal to the "I want to write اِشْتِقَاقَات" section.[edit]

Since we now have the {{ar-rootbox}} template, I'd like to propose the following change to this section:

Substitue the text about roots starting after:

Which produces the following display:
: From Proto-Semitic *paʔr-.

to the end of the section, with the following text:


Roots[edit]

  • Roots are not etymologies, but a handy native tool to group words semantically or look them up in paper dictionaries. For this reason, we should not use the phrase "From the root..." in the etymology section. Instead, we use the {{ar-rootbox}} template. This template creates a box on the right that contains a link to the root entry (See for instance مَكَان (makān)). This template has the following syntax:
{{ar-rootbox|ك و ن}}
If you want to show something in the etymology section, you can use specific Arabic templates as found in Category:Arabic etymology templates to mark derivations by classical prefixes and suffixes. In cases of a specific template missing you can fall back to {{prefix}} and {{suffix}}.
  • If you need to link a root in running text, we can use the template {{ar-root}}. It supports the following two syntaxes:
The root of {{m|ar|مَكَان}} is {{ar-root|ك و ن}}.
The root of {{m|ar|مَكَان}} is {{ar-root|ك|و|ن}}.
Both result in:
The root of مَكَان (makān) is ك و ن (k-w-n).
If you use this template outside of words which belong to the root, you are supposed to give |nocat=1, because else the page gets categorized as in Category:Arabic terms belonging to the root ك و ن.
If you want to add a root to a page but don't want it to show, you can use |notext=1.

Or something similar. I don't seem to have rights to change it myself. Thanks. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 12:41, 27 October 2022 (UTC)[reply]

@Sartma: I have succeeded in fitting in something similar in twenty minutes. I believe that you do have the rights, you only need to be autoconfirmed but are even autopatrolling. Fay Freak (talk) 17:12, 27 October 2022 (UTC)[reply]
@Fay Freak: How weird... Now I could change it, but when I tried earlier I was told the page was locked and I couldn't edit it... Anyway, thanks for implementing the change! — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 20:44, 27 October 2022 (UTC)[reply]
@Sartma: Indeed, now I lacked the permission too, “you do not have permission to edit this page”; but then I noticed I was somehow not logged in even in spite of entering the edit page in response to your ping (something logged me out?). This is in my experience a usual occurrence across wikis, e.g. me not being logged in on English Wikipedia or German Wikisource though theoretically there is “unified login” across Wikimedia, but this is like the first time I notice it on Wiktionary, always logging in from Wiktionary, apparently hitting this specific page on this specific day, perhaps only us Europeans. Maybe privacy officer @Chuck Entz fathoms what is going on there, probably some notorious database issue where there is a race condition of the page rather loading faster than awaiting the response of whether my browser is logged in at perhaps a different second-level domain (which is surely at least a data safety hazard since it incites people to edit with their IP adresses, as the warnings about this cannot be flashy enough when you do not expect to be not logged in, especially the longer the day or night has been) and there should be a phabricator topic somewhere upon this. Fay Freak (talk) 21:09, 27 October 2022 (UTC)[reply]

The gender of numbers 3–10[edit]

I see تسعة and عشرة pages claim they are feminine, but the other numbers ending in tāʔ marbūṭa claim they're masculine. Should we change all the others this way (so that ثلاثة = f and ثلاث = m), which is the case in Proto-Semitic counterparts, with mentioning the number gender polarity? Or should we edit these two numbers so that the gender of the number matches the gender of the noun. Zhnka (talk) 19:03, 4 May 2023 (UTC)[reply]

@Zhnka: Hi. Many Arabic learners fall into that trap. The Arabic numerals are kind of reverse when it comes to genders.
  1. ثَلَاثَة أَوْلَادṯalāṯat ʔawlādthree boys - ثَلَاثَة (ṯalāṯa) is used with أَوْلَاد (ʔawlād, boys)
  2. ثَلَاث بَنَاتṯalāṯ banātthree girls - ثَلَاث (ṯalāṯ) is used with بَنَات (banāt, girls)
Anatoli T. (обсудить/вклад) 00:32, 5 May 2023 (UTC)[reply]
@Zhnka: Sorry, I may have misunderstood you. You want to fix those entries for consistencies? Anatoli T. (обсудить/вклад) 00:33, 5 May 2023 (UTC)[reply]
In my opinion, ثَلَاثَة (ṯalāṯa) and ثَلَاث (ṯalāṯ) are correct, the others (3 to 10) should be done the same way. @Fenakhay: what do you think? Anatoli T. (обсудить/вклад) 00:36, 5 May 2023 (UTC)[reply]
@Atitarev Yes I agree. These numerals are grammatically feminine and all of the numeral entries 3 to 10 should reflect that like these two. عُثمان (talk) 10:11, 5 May 2023 (UTC)[reply]
@عُثمان: I’ll go with the community. The usage notes suffice. ثَلَاثَة (ṯalāṯa) is used with masculine nouns but it’s feminine grammatically. Anatoli T. (обсудить/вклад) 03:53, 6 May 2023 (UTC)[reply]
@Zhnka, Atitarev: I think we should change the others to match تسعة since that's how they are described in Arabic grammar. I don't know from where the reverse genders are. Are there any papers that claim so? — Fenakhay (حيطي · مساهماتي) 01:13, 5 May 2023 (UTC)[reply]
@Fenakhay, @Zhnka: The reverse may not be grammatical but lexical (usage), as in my first post. Does that affect the gender of numerals or usage notes are sufficient? Anatoli T. (обсудить/вклад) 01:16, 5 May 2023 (UTC)[reply]
@Atitarev: IMO I think the usage notes covers that already. — Fenakhay (حيطي · مساهماتي) 01:21, 5 May 2023 (UTC)[reply]
@Zhnka, Atitarev: I've corrected 3-10. — Fenakhay (حيطي · مساهماتي) 01:49, 5 May 2023 (UTC)[reply]
@Fenakhay, Zhnka, Atitarev: Inconsistencies have been brought about to due to unilateral changes of @عُثمان, and I confronted him and looked into grammar at Template talk:ar-numeral#The citation form of Arabic numerals is feminine, but this template takes masculine as the citation form, without taking further action. Fay Freak (talk) 09:25, 5 May 2023 (UTC)[reply]
@Fay Freak That is disingenuous but I think the conversation you linked speaks for itself عُثمان (talk) 10:12, 5 May 2023 (UTC)[reply]
@عُثمان: If somebody claims that something speaks for itself nothing is spoken but he lacked ingeny to deal with any argument. Fay Freak (talk) 11:42, 5 May 2023 (UTC)[reply]

How are the numerals ‘grammatically feminine’? Is خليفة ‘grammatically feminine’? What is gender if not agreement? 109.184.98.226 00:07, 17 March 2024 (UTC)[reply]

Note that Hebrew שלושה etc. are glossed as masculine, as is Syriac ܬܠܬܐ, Geez ሠለስቱ (śälästu) and Akkadian šalāšat. 109.184.88.220 09:53, 6 April 2024 (UTC)[reply]

Writing of ال[edit]

Should I write the diacritics above alif (اَل or ٱل) in the definite article when I write example sentences or quotes? Zhnka (talk) 08:02, 5 May 2023 (UTC)[reply]

Usually, you only write ٱ and leave it vowelless otherwise. — Fenakhay (حيطي · مساهماتي) 11:49, 5 May 2023 (UTC)[reply]
@Zhnka, @Fenakhay: You don't always have to use ٱ, as in this case: بِالْعَرَبِيَّة (bi-l-ʕarabiyya), the module knows that the alif is silent. However, the case with رَأَيْتُ ابْنَهُ fails (but it shouldn't) but رَأَيْتُ ٱبْنَهُ (raʔaytu bnahu) works. When using {{ux|ar}} or {{uxi|ar}}, you can also use |subst=, e.g. |subst=ابْنَهُ//ٱبْنَهُ in رَأَيْتُ ابْنَهُraʔaytu bnahuI saw his son. Anatoli T. (обсудить/вклад) 01:01, 6 May 2023 (UTC)[reply]

Letter ی U+06CC[edit]

@Fenakhay, @Fay Freak: Hi. The project page doesn't mention letter ی U+06CC with a label (Egypt, Sudan) as opposed to ى U+0649.

I thought Egyptians use ى U+0649 in the final position because ی U+06CC is not on Arabic keyboards.

BTW, I have two Egyptian Arabic textbooks by Egyptians (in English), which use ي (y) in any position. Anatoli T. (обсудить/вклад) 03:47, 18 March 2024 (UTC)[reply]

It’s all true. Especially the part about the keyboard layouts. As observed, ى U+06CC and ي was used anywhere in the past concurrently, but in the 21st century even in Egypt distinction is standard.
For languages which have ceased to be written in Arabic before the digital age or have never been written but would be written in Arabic, thus also for the previous usage of Arabic, any dictionary entry, on one of the three pages, describing said general usage is debatable, as from printing and handwriting you can’t too well tell which encoding it uses. Fay Freak (talk) 12:09, 18 March 2024 (UTC)[reply]