Wiktionary talk:About Arabic

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

See also Appendix:Arabic script

Arabic words—organizing by root—proposal[edit]

The English-Arabic dictionary section has the potential to be a very useful section for English speaking students of Arabic. However, the fact that virtually all Arabic words are based on a three (very rarely 4) letter root, with standard prefixes, suffixes and infixes, presents unique problems for organizing an English dictionary of Arabic. Simply organizing the dictionary alphabetically would be unwieldy and difficult to use; when looking up an Arabic word, one typically identifies the 3 root letters, then the "form" of the verb it is associated with (there are 10 common forms) and looks up the entry alphabetically by the root letters, to find the definition.

The advantage here is that all related words are grouped together instead of being spread throughout the dictionary. Also, if a dictionary were not organized by root, most words would begin with one of three letters: the equivalent of "Y", "M" or a glottal stop.

I propose the following variation, then, to the standard Wiktionary word page, for Arabic words:

Word This would be the entire word, which could still be searched for directly, without deciphering the root letters, for instance منظمة

Arabic Language, as per wiki normal

ROOT in the above example, this would be ن ظ م without the prefix "m" letter and the suffixed "a" sound.

PATTERN NUMBER The above word is form II, or as Arab dictionaries describe it: wazn فعّل

Part of Speech —Here I don't know whether it makes sense to use English grammar terminology, which only loosely describes Arabic grammar functions: I suggest it would be helpful to also include the Arabic grammar terms (masdar, etc.).

pronunciation 'munáthama' Definition. 'organization' References etc...

--Jackbrown 13:51, 4 January 2006 (UTC)

To make this proposal simpler to implement, I’ve added sample code at Category talk:Arabic language, the Talk: Arabic Language page, which can be EDIT:Copy:Pasted into new word definition pages. All and sundry should feel free to improve both on this idea and the sample page; I really meant it as a proposal to be discussed. --Jackbrown 12:17, 6 January 2006 (UTC)

Arabic verb form templates[edit]

Please comment. For a usage example see خ د ر.

Arabic root entry[edit]

Use template {{ar-root-entry}} to get the root along with a transcription into IPA and another way of transcription.

Template {{Arab}}[edit]

The article should have a short section expounding the use of {{Arab}} for proper display of Arabic script. --EncycloPetey 21:23, 3 February 2007 (UTC)

Arabic words - organizing by root - proposal (Tranfered form: Category talk:Arabic language)[edit]

The English-Arabic dictionary section has the potential to be a very useful section for English speaking students of Arabic. However, the fact that virtually all Arabic words are based on a three (very rarely 4) letter root, with standard prefixes, suffixes and infixes, presents unique problems for organizing an English dictionary of Arabic. Simply organizing the dictionary alphabetically would be unwieldy and difficult to use; when looking up an arabic word, one typically identifies the 3 root letters, then the "form" of the verb it is associated with (there are 10 common forms) and looks up the entry alphabetically by the root letters, to find the definition.

The advantage here is that all related words are grouped together instead of being spread throughout the dictionary. Also, if a dictionary is not organized by root, most words would begin with one of three letters: the equivalent of "Y", "M" or a glottal stop.

I propose the following variation, then, to the standard Wiktionary word page, for arabic words:

Word This would be the entire word, which could still be searched for directly, without deciphering the root letters, for instance منظمة

Arabic Language, as per wiki normal

ROOT in the above example, this would be ن ظ م without the prefix "m" letter and the suffixed "a" sound.

PATTERN NUMBER The above word is form II, or as arab dictionaries describe it:wazn فعّل

Part of Speech --here I don't know whether it makes sense to use English grammar terminology, which only loosely describes arabic grammar functions, or whether it would make more sense to also include the arabic grammar terms (masdar, etc).

pronunciation 'munathama' Definition. 'organization' References etc...

The advantages of this minor variation on the normal definition page will be obvious to students of arabic I think.

>>So my first question, then, is the following: on Wiktionary, how do we go about imposing a fairly radical change in organization of one part of the dictionary? In other words getting people to include two extra indexes (root and pattern) to the arabic words they enter...

>>And second: are we really supposed to hand code every word definition, then rewrite two or three other pages (the front end of the Arabic-English dictionary, etc) to link to every word we enter? Or is there some slightly more automated process for entering and linking to word definitions? --Jackbrown 13:51, 4 January 2006 (UTC)

Sample Code here - you can "edit" this page (use the tab at the top of the page, not the ones below), then cut and paste the code below to make a template for your new word definitions, filling in the appropriate headlines -

Actually I agree with Jack about this way of organization , it is the best way to make the English-speaking students feel the sense of Arabic language and the great capability for derivation and relationship between the related words . We have firstly to make good list of Arabic words arranged by the first letter , then these words should also categorized according to their root . maby we can make later 2 indexes : one by first letter , and other by root --Chaos 12:13, 11 January 2006 (UTC)

I suggest doing both (the root page and the word level page), because sooner we will have other pages trying to link to specific derivations and vocalizations of the same root. To give an example, I created the page عبد, when I found out that the entry slave doesn't have a link to the Arabic word but then I thought it would be less confusing to link direclty to عَبْد, which I then created. I think both can serve different complementary purposes. Interlinking and categorization can improve things. Thoughts?.--Hakeem.gadi 09:25, 30 April 2008 (UTC)

I believe that having a separate page for the root, which lists all the derivative words linking to their own pages. In this case the root should be written in isolated letter forms (e.g.ن ظ م ) for the the reason that many existing words would look exactly like their roots.Hakeem.gadi 06:23, 16 May 2008 (UTC)

While I know next to nothing about Arabic, I do have a small understanding of Semitic languages through some study of Hebrew. I think that organizing words by the tri-letter roots is an excellent idea. I would suggest considering an approach similar to what Hebrew is doing. If you take a look at Category:Hebrew roots, you'll see some of their roots. The root pages can be probably be formatted in a similar fashion to hypothetical language entries, such as Appendix:Proto-Indo-European *ph₂tḗr, with a brief definitional note and perhaps a further etymology, and a list of all words using that root (probably organized in some intuitive way). Then, you just put a link to the root in the etymology, and you're all set. You can keep the nice organization, which is specific to Arabic, while still conforming to Wiktionary formatting conventions. Because I gotta tell you, trying to go against formatting conventions is an arduous uphill battle (and with good reason too, there are a great many benefits to standardization across languages). Also, it might be worthwhile to move this convo to Wiktionary:About Arabic, as that's really where it belongs. -Atelaes λάλει ἐμοί 06:35, 16 May 2008 (UTC)

I think something should be changed about the ordering of the roots. Some arabic roots are called geminate roots, meaning that the second and the third consonants are the same. However they are written with just two letters, with the diacritic shadda above the second letter to indicate that it is actually doubled. That puts those roots before all other roots starting with those two letters of course, and that is how it is in each and every arabic dictionary I know. In my opinion, our Arabic Roots category would be better if it was organized that way. -Beru7

In the first place, Hans Wehr’s Arabic-English Dictionary sorts doubled verbs as though the root has only two radicals. Second, the only place where this will make any difference will be the indexes (currently almost nonexistant), and the biggest problem with them is just getting the words into them according to the root, something that is probably going to be very labor-intensive. Once the words find their way into the proper index, a regular Unicode sort such as used by Hans Wehr should be more than sufficient. —Stephen 19:26, 24 March 2009 (UTC)
Thanks for your answer Stephen. The index exists already and it is wrong, see Category:Arabic roots. The problem is that the geminate roots that have been created before are not written the way I have described. For example: ء ل ل should in fact be ء ل. I just want to make sure we agree ! I intend to create more entries and write the geminate roots with just two letters if there is no objection. -Beru7
Category:Arabic roots is not the index. The index is at Index:Arabic. Category:Arabic roots is just a category, and it is roots, not words. If you go to ء ل ل, you will find listed some of the words that have this root, but spelt in the normal way with only one ل. For all geminate radicals, please include both of them when using {{ar-root-entry}}. See how we use this template in the etymology section of أل. —Stephen 23:33, 24 March 2009 (UTC)
It occurs to me that you may be confusing Category:Arabic roots and Index:Arabic with Category:Arabic verbs. If you want to create pages for doubled verbs, they will appear in Category:Arabic verbs. —Stephen 00:04, 25 March 2009 (UTC)
Category:Arabic roots is still an index even if it does not bear the name, or am I wrong ? And I am definitely not talking about verbs. Verbs cannot be doubled as such, only based on a doubled root. Some roots have no verbs derivating from them. Anyway, for now, I will follow what has been done already, as you suggest. About the ar-root-entry template, it is convenient but will not work for bi- and quadri-literal roots. Thank you ! Beru7 11:13, 25 March 2009 (UTC)
We don’t call category lists "indices"...only Index:Arabic. Yes, unfortunately {{ar-root-entry}} only works for triliteral roots. Other roots have to be formatted manually, or another template could be made to accomodate them. The Category:Arabic roots is a fairly new creation and most of the older pages (that is, most of the Arabic pages) have not been coordinated with it yet.
We also have {{ar-verb-fa3ala}} and similar verb templates which you might be able to use. I have found that they don’t always work correctly, especially if there is a weak radical (such as final ى). So these templates need a little more work. For prepositions and conjunctions with bound pronouns, we have {{ar-prep-inflection}} (e.g., عن).
It is especially useful to include translation tables as I did in كتب. —Stephen 11:38, 25 March 2009 (UTC)
Stephen, thanks a lot for your help. Being new here I have a few more questions. What are the translation tables your are talking about ? Do you mean putting all the derived forms on the same page ? My humble opinion is that derived forms belong on the root page, not on the form I verb page. If we put the I-XV derived forms on the form I page then shouldn't we put the ism-alfaaʕil (active participle) there ? and the ism-almafʕuul (passive participle) ? and so forth and so on. And we end up with a page which is the same as the root page ! Wouldn't it be more logical if each derived word had its own page, including Form I verb, with the root page having links to all those words that derive from it ? We could then organize the root page the same way as the entries are organized in a dictionary: first form I verb (if it exists), followed by all the words derived from it, then form II and its derived words, etc... —Berenger
Oops, sorry, I misspoke. I meant conjugation tables. (Too long without sleep!) I agree, putting all of the verb forms on the Form I page is problematic. Almost all of that was done before we had the {{ar-root-entry}} and the Category:Arabic roots. I still believe all of the forms should go on the Form I page, but perhaps it would be better if they were put under ====Related terms====. Then, of course, they would also be on the root page. And yes, the participles and verbal nouns should also appear under ====Related terms====. —Stephen 10:28, 26 March 2009 (UTC)
Ok, conjugations tables are good. Maybe we could have a template for those and hide them by default (just like the conjugations in french). I'd like to sum all that up for future contributors but I'm not sure what the right place is for that. Wiktionary:About_Arabic I'd guess ? Also can you point to an example where the verb template is wrong, I'll try to fix it. (I had a look at م_ش_ي and it looks fine) Beru7 13:32, 26 March 2009 (UTC)
Yes, a template would be great, but I only know how to make very simple ones. I don’t know the wiki code very well. And yes, the place for it is Wiktionary:About Arabic.
The verb templates that I sometimes have a problem with are the ones like {{ar-verb-fa3ala}}. I don’t remember now which of the templates it was or which verb I was working with. Next time I have that trouble, I must write it down somewhere. —Stephen 01:31, 28 March 2009 (UTC)


Following discussions above, I have written the following that could be put on Wiktionary:About_Arabic, please review and discuss (I'm especially looking for opinions about the non-vocalization of page names, pointing out mistakes in the english text is also welcome, english is not my native language!):

Guidelines for Arabic entries:

Arabic roots: Each Arabic root has its own entry, such as خ د ر. The root entry should explain the meanings of the root and list links to the words that derive from the root. The {{ar-root-entry}} template should be used and Category:Arabic roots appended. Ideally the order for those words should be as follows: First the Form I verb (if it exists), followed by all the nouns, participles and other words deriving from it, in alphabetical order, then form II if it exists and so forth and so on. This follows the way most Arabic dictionaries are organized. Please use the inflection templates.

Arabic words: Page names are not vocalized, meaning, for example, that form I and form II will be on the same page. Initial hamzas should not be written in the page title, but a page with the hamza should be created, redirecting to the main page for the word. However words within the page should all be written with all hamzas and proper diacritics showing vowelization. Each word should have a link to its root page (using {{ar-root-entry}}) under the Etymology section, if applicable (many loan words, for example, do not have roots).

Arabic nouns: Nouns should be presented with no case endings. Please include gender, number and gender and number inflections where applicable.

Arabic verbs: Please use the {{ar-root-entry}} template. Links to words derived from the verb should be under ====Related terms====, and conjugation tables for the verb should be included on the page.

-- Beru7 18:35, 27 March 2009 (UTC)

It’s pretty good. I made a few minor changes. I don’t know if you have noticed yet, but Arabic vowel points are a problem here. Wiktionary only allows simple vowel points, and if you put a double point, such as shadda-fatha, it won’t appear correctly. That’s because Wiktionary automatically changes the order when you save it, to fatha-shadda. It is possible to have double points if you type them this way: كتبتُن& #x0651;& #x064E;, which makes كتبتُنَّ. Otherwise, you get كتبتُنَّ. Possibly this might be made easier by using templates.
Besides vocalization, there is the matter of initial hamza. Most words that have initial hamza are usually written without a hamza, and in order to make it easier to find a word by pasting it into the search box, I have usually put page names without initial hamza (e.g., اب). However, a lot of people think that if a word has initial hamza, strictly speaking, then it should be in the page name. I have not made an issue of this, and when someone makes a page with initial hamza, I leave it that way. But it really should be subject to a policy, one way or the other. —Stephen 01:59, 28 March 2009 (UTC)
I prefer to use alifs with hamza, if the word has it, first, because that's the way they appear in dictionaries and they are more commonly written with hamza on the web. Perhaps a redirect page could be used to enable search for words without hamza in the search box. I would perhaps do the same for final alifs where it follows tanween fatha, e.g. احيانا redirect to أحياناً Anatoli 14:34, 28 March 2009 (UTC)
Stephen, just like Anatoli, I think strong initial hamzas should be written, as they are in most books and newspapers. I understand your cut'n'paste point, but then very often ى is written for ي (or the reverse as I recently saw in an article in el ahram !). Maybe we could use redirects as Anatoli suggests. However, Anatoli, tanwin is a vocalization mark and is usually not written (whereas hamza is a letter). I see no more reason to have a شكراً page than to have a كتّب page. Once again, can be fixed with redirects but I think that for consistency, it would be better if we kept شكرا --Beru7 17:11, 28 March 2009 (UTC)
I've thought about it more and I guess the problem with strong initial hamzas (the ones that should actually be written) is that they do not appear on all keyboards. For example, the stickers on my keyboard show أ and آ but إ is not written on the stickers I have, even though it is accesible through shift+ع. I've looked at pictures of arabic keyboards, and most of them are the same. And something else: arabic dictionaries show all words beginning with hamzas (strong أ or weak ا) under ا. Our categories will be displayed the same way if we do not include initial hamzas in page titles. Beru7 15:17, 2 April 2009 (UTC)
OK, Beru7, with tanwin fatha it's less consistent than with hamza. Also, the popular editors - Yamli and Google transliterator both consistently offer hamza and tanwin fatha but not the Google translator. I heard some Arabs cringe at missing tanwin on words like شكراً but words without it are still too common and perhaps more common than with it. As for ى written for ي, this is the common Egyptian style, isn't it? I've got a book translated into Arabic, ي is never used in the final position, making it confusing, the English Wikipedia mentions this feature of the Egyptian style of writing. The redirects could also be used, of course, if the word with the other letter doesn't exist.Anatoli 22:29, 28 March 2009 (UTC)
I would like to include something about transliteration in the guidelines. Right now, things are a bit messy, with many systems used, and even sometimes mixed. My own preference goes to the qalam system (http://en.wikipedia.org/wiki/Qalam), which I think is the best for english speakers. It is very easy to type, and gives a good idea of the pronunciation. However there are two flaws, in my opinion: first is the transliteration of the ع, the symbol used, ` is not on all keyboards (not on mine at least!), and not very easily distinguishable from the one used for hamza, '. It seems to me also that it does not provide a way of telling ث (th) from تْه (also th). Personaly I'd like to use 3 for the ع and I have nothing to propose for the "th" problem, which should be rare, though.
I'd also like some input on the conjugation templates I have created this week, please see كتب. Especially, I'm not happy about the presentation.
Lastly, I was wondering what to do with case inflections. Should they be specified for nouns ? adjectives ? all words that actually have such inflections ? And how should they be presented ? Any ideas are welcome. --Beru7 17:10, 10 April 2009 (UTC)
A transliteration guideline is definitely needed, but I’m not sure about the qalam system. The qalam system: ' aa b t th j H kh d dh r z s sh S D T Z ` gh f q k l m n h w y. I don’t have strong feelings about aa/ee/ii/oo/uu for long vowels, and using uppercase H, S, D, T, Z is not too bad, but I don’t like the digraphs th/kh/dh/sh/gh. I am very much against using ` for ع, because, as you pointed out, it is not readily distinguishable from ' on a computer screen. I have been using ’ for hamza and ʕ for ʕain, but would not be opposed to 3. The transliteration system that I have been using is: ’ ā b t θ j ħ x d ð r z s š ṣ ḍ ṭ ẓ ʕ ğ f q k l m n h w y. My main concern with the qalam system is the double letters, which will lead to confusion with words such as مستسهل (mustashal).
I think the conjugation tables are looking very good, but there is too much separation. It would be better to have a single table instead of nine separate ones. I don’t know if it would be very difficult to do a single table...html (or whatever this markup language is called) is a mystery to me.
As for case inflection, I think nunation should be marked where appropriate أختٌ, بنتٌ. It would also be nice if there were a simple and clear way to list all of the case inflections, but I don’t have any ideas about how it could be done. Maybe it will be too much. —Stephen 19:03, 10 April 2009 (UTC)
What about removing the ambiguity like this: mustas-hal or like this mustas°hal, which I think is nice because it looks like a sukun. --Beru7 20:41, 10 April 2009 (UTC)
I think mustas-hal would be less confusing. The ° could be read as a weak o. If we use sh, th, dh, etc., there is another -h combination that, though not used, would still probably be misunderstood. For instance, ازهر, azhar. If sh = š, then zh will likely be read as ž (ažar). —Stephen 23:49, 12 April 2009 (UTC)


ROOT (then the three or sometimes four Arabic letters of the root)[edit]

Form (then the Pattern Number I-X and the Arabic فعل pattern)[edit]

Part of Speech[edit]


  1. Definition.

Part of Speech (a second one if appropriate)[edit]


  1. Definition.


Romanization proposal[edit]

The proposed romanization system for Arabic is based on the qalam system with a few modifications. Reasons for this choice are:

  1. Transliteration of characters which have no equivalent in latin script are, where possible, the ones that are already in use in the english press, books and atlases (for example, ش gives sh, خ gives kh etc.)
  2. The romanized result is easily typed on any latin keyboard, because it does not use any special characters. This is important because the volume of transliterations that have to be typed is high, and arabic entries already require typing in 2 different scripts at the minimum.
  3. It is easily read on a computer screen.

The modifications to the qalam system are, mainly:

  1. Diacritics are always transliterated
  2. ع gives 3 (widely used on the internet for transliterating that letter)
  3. ة gives a(t)
  4. the character - is used to remove ambiguities such as between ش (gives sh) and سْه (which would give sh also). In the latter case, the correct transliteration is s-h (the ALA-LC system uses ′)
  5. 'alif madda transliterates to 'aa
Letter Rom. IPA Notes
ا aa when in initial position,the 'alif then represents a weak initial hamza. It then transliterates to the short vowel it supports (a,i or u)
ب b b
ت t or t- t use t- when transliterating ـتْهـ to avoid confusion with ث
ث th θ
ج j ʒ
ح H ħ
خ kh x
د d or d- d see ت for d- usage
ذ dh ð
ر r r
ز z z
س s or s- s see ت for s- usage
ش sh ʃ
ص S
ض D
ط T
ظ Z ðˤ
ع 3 ʕ
غ gh ɣ
ف f f
ق q q
ك k k- k see ت for k- usage
ل l l
م m m
ن n n
ه h h
و w or uu w
ي y or ii j
ء ' ʔ
ة a(t) a at isolated words should use a(t), if not isolated, a or at should be used.
Short vowels
ـَ a a
ـُ u u
ـِ i i
ـً an
ـٌ un
ـٍ in
Long vowels and diphtongs
ى ا aa
آ 'aa ʔaː
ـَو aw aw
ـُو uu
ـَي ay ay
ـِي ii
  • Hamzas are always written ' regardless of which letter they sit on
  • Orthographic و and ا occuring at the end of certain verbs are not transliterated
  • ال always gives il- regardless of elision and sun and moon letters rules
  • To transliterate shadda, the concerned consonant is written twice.
I think it would be much better to use s-h than s°h. Another possibility is s~h (mustas~hal). Otherwise, most of it seems okay. If I understand you correctly, you want to transliterate استلم as estálama, and استسلم as estáslama. I’m okay with trying this out, but I not sure what kind of response we’ll get. It’s a radical change. —Stephen 00:14, 13 April 2009 (UTC)
It's not a radical change, it's just a bad idea ! اُكْتُب would give ektub, for example. It's not good. So I've changed that to be the same as in most other systems: weak hamzas transliterate to the vowel they carry. I've also changed the ° thing. And removed the dialectal variants, which are too many if you consider all the dialects that exist. I guess each one that has its own language code will have its own modifications to the transliteration system. Thanks for your comments --Beru7 16:13, 13 April 2009 (UTC)
I prefer the older method. I haven't created many Arabic entries but I've been adding some translations. Symbols can be copied from here: http://en.wikipedia.org/wiki/Romanization_of_Arabic. It's also consistent with Wikipedia, with the exception of "kh" for "x". If you insist on the change, I would change 3 to ` (backquote), e.g. al-`Irāq. Also, I would always romanise the definite article as "al-", unless it's a dialect and not use "e" and "o" at all, except for foreign words or dialects and show the pronunciation change for the sun letters. Hamzas could be omitted if at the beginning of a word. Anatoli 22:45, 15 April 2009 (UTC)
I too would have prefered to use the same system as in wikipedia, but it's very inconvenient given the volume of transliterated text that has to be typed here: entries, all links within entries (think of all the broken plurals), examples, translations, and if we were to scrupulously follow the wiktionary transliteration policies, even all the conjugated verbs and various inflections. Copying/pasting symbols will take forever.
The ` / ع issue has been discussed above: both Stephen and I have agreed that it is not easily distinguished from ' which is standard for hamzas. And it's not easily typed, either, at least not on my french keyboard !
I agree about al-, it would be more consistent since the vowel is a fatha.
"e" and "o" are not used in the present version.
For the reason why sun and moon letters should not be transliterated, please refer to Wiktionary:Transliteration#Key_terms, last paragraph. It will also reduce the number of possible errors.
Lastly I can think of no reason why strong initial hamzas should not be transliterated if all other diacritical marks are. What reason do you have in mind ?
Thanks for your comments. --Beru7 05:34, 16 April 2009 (UTC)
You are basically suggesting a transliteration used in Arabic chat alphabet? Issue with copying is not just for Arabic but Chinese, Russian and other languages whose transliteration requires diacritics. If you think you are going to stay an active wiktionarian and be consistent, then I may agree but you are new (I have started being active just recently as well) and we already have a lot of entries romanised with diacritics. What are your thoughts about this?
Missing transliteration for the initial hamza is common for textbooks and wikipedia, as the initial glottal stop is always there (if the vowel is not elided) like in German, it doesn't have to be written.

Anatoli 06:37, 16 April 2009 (UTC)

Why not transliterate just one letter as it is in the chat alphabet if it's convenient, works, and is understood by most ? The rest is the qalam system, which has been around for a long time.
Diacritics may be required for other languages but if, for arabic, we can do without any while using an established system (qalam), then why shouldn't we ? It's not just for me, it'll be more convenient for everyone. We absolutely need a romanization standard. Right now there's none, and the result is a mess, with several systems used and sometimes even mixed. So we get to pick one now. It might just as well be one that suits our needs, which is why I picked this one: not because it suits my taste but, I insist, because it suits our needs.
About strong initial hamzas (shown by the ء mark), they indicate that the vowel they carry is never elided. So they are important. Remember, we are a dictionary and as such, our transliteration system should show as much about the original orthography as possible. --Beru7 07:07, 16 April 2009 (UTC)
OK then. Let's do it the way you suggest. The side benefit is that it would almost enable to convert the words to Arabic in Yamli editor (http://www.yamli.com/arabic-keyboard/), etc. We are not adding word stresses then? Although, the transliteration may look as scientific, the number of Arabic entries and translations is more important to me. Even the basic Arabic vocabulary is not yet covered in Wiktionary.
I wouldn't spend too much time on all Arabic conjugations and declensions (if there examples) but provide the essentials - plural for nouns, present, maSdar and imperative forms for verbs, something that is always important for those who are interested in Arabic. Anatoli 23:03, 16 April 2009 (UTC)
Has this discussion stalled? The impact is that we haven't agreed on the standard of romanising Arabic. If the problem is 3 vs ` and -a(t) vs -a, I am sure we could come to some compromise. Anatoli 05:44, 12 May 2009 (UTC)

Typing tool[edit]

Here's a little toy I made this afternoon: http://www.enselme.com/beru/trans2arab.htm Type in your transliterated text and it should give it back to you in arabic script, fully vowellized. It uses exactly the same system as we do, but I had to make a few additions, add -an at the end of a word to get tanwin, like shukraa-an, and aa~ to get alif maqsura, like mashaa~ There are probably bugs left so if you find any, let me know on my talk page ! I hope this will help us get more vowellized entries ! I don't know if this could be integrated in the wiki or not, but the code is free to use, reuse and modify... --Beru7 23:28, 18 April 2009 (UTC)

Looks interesting. How do you enter fatha tanwiin + alif? مرحباً Anatoli 00:13, 19 April 2009 (UTC)
like this marHabaa-an --Beru7 11:49, 19 April 2009 (UTC)
Thanks, next questions: لأن and جامعة, combination لإ how do you enter these? It looks promising if you get all the variations and some short tutorial would be great. Anatoli 13:18, 19 April 2009 (UTC)
For جامعة, it would be jaami3a(t) and it works. For لأ it should probably be l'a but it's not working. Thanks for finding this one ! And I'll try to make a prettier page later. --Beru7 13:33, 19 April 2009 (UTC)
Replying here, where the link is and so that other users can see. Thanks for the fix, Beru7. It looks better. Perhaps worth trying to enter some longer text for testing. Anatoli 22:56, 22 April 2009 (UTC)

Arabic languages[edit]

In some respects we consider Egyptian Arabic, et al., as dialects of Arabic:

Yet in other respects we consider them separate languages:

What's going on?—msh210 22:36, 22 April 2009 (UTC)

What's going on is that Arabic on en.wikt is a mess right now.
I wanted to raise the same issues so thanks for doing it ! I guess the problem is that there is enough difference between the dialects to have different language codes, but not enough differences to really have complete dictionaries for each, although dialect dictionaries do exist, but there is a lot of overlap with MSA dictionaries. So what should we do ? Several possibilities:
  1. Remove all dialectal words from Arabic sections. Cut and paste all words in common between msa and the various dialects. A lot of work, and a lot of redundancies.
  2. Remove all dialectal words from Arabic sections. Put them in the appropriate dialect sections. For words that are common to both a dialect and MSA: insert a link to the MSA section. This is what I have been trying with Egyptian Arabic lately. But autoformat doesn't seem to like it.
  3. Put everything in Arabic, marking dialectal words. This might get very confusing.
Personally I think 2 is the best option if it can be made to follow the general policies of the wiki. 1 would be my second choice, and 3 is probably not a good idea. Of course there might be other solutions.
I'd very much like to know how it is done in other languages which face similar situations. Beru7 23:21, 22 April 2009 (UTC)
Also, finding out what's dialectal and what's standard is hard. "Common usage" is not much of a criterion for Arabic, as common usage is colloquial, often considered slangy if you use it in writing. Dialectal words are often written as normal but pronounced differently (dialects may miss some sounds from classical Arabic) but some prefer to highlight the difference. So, it's a mix. I agree that perhaps marking words as dialects may be sufficient. The translations should focus on MSA. The border between MSA and dialects is not always clear. For example, some people consider loanwords as slang, do not want to include as standard Arabic but they are too common. The mess may continue as editors may be from different Arab countries where attitudes differ as to what is right and what is wrong, there could be some spelling/transliteration variations as well, especially for less formal words. Anatoli 00:21, 23 April 2009 (UTC)
I think common usage in MSA is what you will find in books and newpapers. That is the corpus that has been used to build modern arabic dictionaries. That language has a name in arabic, فُصْحى (fusHa). So that would go in "Arabic" which should really be "Modern Standard Arabic". I cannot speak for other dialects, but for egyptians, the difference between فُصْحى and عامِّية ‎(3aammiyya(t)), dialect, is quite clear. There is a lot of overlap with MSA in terms of both grammar and vocabulary, but there are also significant differences, and not only of pronunciation. Sometimes the singular form of a noun will be the same but a different plural will be used. Conjugation is not the same. In Egyptian, dual is barely used. Phonology is very different.
So native speakers are very aware of these differences and can switch from one to the other (diglossia), at least if they have been to school.
Upon reflection, I think we should really consider arabic dialects as separate languages, and remove references to dialects in the Arabic sections. Beru7 12:53, 23 April 2009 (UTC)
Agree with the suggestion. My point about the mix is the so-called Formal Spoken Arabic, which has elements of both. The mix can be different, I beg to differ, especially in Egypt, where the dialect has much higher status and usage. This discussion is not about this, anyway. Otherwise, we start discussing, whether we spell أطيل or أتيل or we should avoid it and use فندق only. :) You may say, they are part of standard Arabic now but a purist from Saudi Arabia may not agree. Just an example, perhaps not a perfect one, forgive me, my Arabic is not very good but I read a lot about it. Can we say, if Hans Wehr dictionary or another solid dictionary uses it and a word is used in newspapers, then we can include it? Anatoli 13:09, 23 April 2009 (UTC)

Initial Hamzas[edit]

A few months ago, when we were discussing the guidelines for Arabic we came to the decision of not including strong initial hamzas in page titles. I have come to think that this decision is the wrong one. I have been reading a lot of Arabic texts in the past few months, and most show all, or nearly all, of those hamzas. Here are a few examples:

I think most other wiktionaries do not have the same policies, either, and that sometimes causes problems with interwiki links. Beru7 19:34, 22 June 2009 (UTC)

Words written with and without the intial hamza are both common but the former method is recommended in many Arabic textbooks (it doesn't apply to elidable hamza). I prefer to write it in translations from English but as we discussed before, the Arabic entries may exist without hamzas but there must be a redirect with it. Anatoli 20:26, 22 June 2009 (UTC)

Publishing the guidelines and cleaning up Wiktionary:About Arabic[edit]

I think it would be time to publish the guidelines above on Wiktionary:About Arabic, and generally clean that page up. I will do that and also change the hamza guidelines in a few days if nobody has any objection. Beru7 17:05, 23 June 2009 (UTC)

There were serious objections about these two, if I haven't missed something:
  1. ع gives 3 (widely used on the internet for transliterating that letter)

Me too, I prefer "`" as in qalam.

  1. ة gives a(t)

I prefer just "a", except for cases where it is pronounced - "at", ignoring the case endings. Anatoli 01:58, 24 June 2009 (UTC)

To me (who can't read Arabic, does not contribute to it but looks up entries quite often) the words transliterated with this new scheme are very confusing, e.g. مطعم. I liked Stephen's scientefic scheme (which is close to w:DIN 31635 or w:ISO 233) seen in السلطة الوطنية الفلسطينية much better. I understand it's hard to type those characters but, hey, we don't avoid doing right things just 'cause they are difficult. After all, this is a solvable problem: you can download Microsoft's excellent tool and map your own keyboards with any desired characters. Alternatively, when Conrad implements his automatic transliteration tool into WT:EDIT, Arabic words could be automatically semi-transliterated with vowels being added manually. This said, I understand I can't make you all chose the harder path and I promise not to raise the devil in case you pass the proposal officially :) --Vahagn Petrosyan 12:38, 23 July 2009 (UTC)
Hi Vahagn ! Actually this is not the scheme Stephen was using. Before the new system, Stephen was using IPA characters to transliterate arabic. I guess the really big problem with our system is the usage of "3". It confuses many people. On the other hand it has become the standard for informal transliteration (called arabizi, عربيزي)and is used more and more (Microsoft Maren, Google Ta3eeb, Yamli etc...). The main obstacle in changing this has been the opponents of the "3" because they want to hear about nothing but "`", which is hardly distinguishable from "'", which is used to transliterate a completely different letter. But "ɛ" could be used, or "ʕ". I thought about "c" but since there is no unicode for it it is not going to play well with search nor Conrad's tools. By the way - I find the DIN system horribly confusing with t ṯ ṭ, h ḥ ḫ etc... and the complete specs are not free, they have to be bought from the DIN. On a final note, an interesting propery of the current system is that it is machine-convertible to other systems: DIN, for example etc. This might prove useful in the future. Beru7 15:03, 23 July 2009 (UTC)
I think "ʕ" is best for ayin. Other confusing things to me are aa instead of ā and other long vowels. --Vahagn Petrosyan 09:20, 24 July 2009 (UTC)
I have noticed that the symbol "3" does not work with the template {{ar-verb}} if the "3" is the first radical. It is treated as a numeral and gets moved to the left of the Arabic verb. —Stephen 15:47, 26 July 2009 (UTC)
I have added a call to {{LR}} before the transliteration and it works now. Beru7 17:23, 26 July 2009 (UTC)
Thanks. Now I see that the same problem exists with {{infl}} used for nouns and adjectives, as in عزل. —Stephen 20:09, 26 July 2009 (UTC)
I don't have the rights to modify that template but this fixes the problem: {{infl|ar|noun|sc=Arab|tr={{LR}}3azl|g=m}} —Beru7 20:19, 26 July 2009 (UTC)

Hamzas in the lemmas[edit]

Hi! As everyone knows, short vowels are actually never indicated in everyday text you read in Arabic newspapers or see on TV programs, probably the Qur'an is the only occasion where they are marked. (Or am I wrong?) Anyways, I started recently Arabic (so, I know I probably should ask this somewhere else than here, sorryyy...), so I have a question: do they always vocalize the initial hamza alifs that have short vowels? In other words, do they write for example the word أخت "sister" more often as اخت? I'm asking this as an example to determine what's our policy or guideline with the lemmas. We have apparently articles that follow different rules:

...or then vice versa...

No, initial hamzas usually are not written. أخت is rare, اخت is usual. Years ago, we began by leaving the initial hamzas off, since that is how the words are usually written, but over time some editors have felt that, for a dictionary, it is important to put the hamzas where they belong. So some older entries may not have them, while newer entries probably do have them.
I don’t think we have reached a consensus on how to handle the various common ways to write a word...if we should use redirects, or make entries that explain that it’s an alternate form. Therefore, most of these alternate forms have so far just been ignored.
Of course, there are some words that are spelled the same way in Persian, except that Persian does not use the initial hamza. Such words cannot be redirected to the hamza spelling, so an "alternate form" entry would be required.
In general, then, if a hamza belongs on a word, even though it is commonly left off in normal texts, we are putting the hamza where it belongs. It’s one of several very tricky issues with Arabic, most of which have still not been seriously addressed. —Stephen (Talk) 04:03, 4 July 2012 (UTC)
In a modern running Arabic online text the hamza is usually present or very common, unlike optional diacritics. Also, Google translate and automatic conversion tools like Yamli usually support hamza as well. Just try typing "anta" in Yamli and see how often أنت is written on the web with a hamza. In the books I've got at home hamza is quite common but I also have books that don't use them often. No, we haven't reached consensus, so it's better to discuss how we are going to treat this issue before changing anything. Egyptian Arabic is more relaxed and uses hamza less often, same with ى in the final position, which is use when ي (y) is required or ه (h) when ة () is required. All searches should eliminate Persian and other Arabic based languages. As I mentioned, Egyptian Arabic is written in a less strict form. --Anatoli (обсудить) 04:21, 4 July 2012 (UTC)
Perhaps we could treat Arabic hamza similar to the Russian ё issue? Have redirects and alternative forms for words without hamza? At least, writing hamza is considered correct, even in a non-vocalised text. Many words without hamza may also be the spelling forms for Egyptian Arabic, Persian, Urdu, if other letters coincide. Note, words with elidable hamza are not spelled with hamza, even if in cases where it has a phonetic value, eg. ابن. Please check Arabic Wikipedia for letters أ‎‎, إ‎ (ʾi‎)‎, ﺁ‎‎, ؤ‎‎‎ and ئ‎‎. They are quite common. --Anatoli (обсудить) 04:31, 4 July 2012 (UTC)

That discussion should have been discussed at the respective talk pages of the mentioned words, however, the case for the month April, it is pronounced in two ways by Egyptians (loanword vocabulary isn't standardized in Literary Arabic), either with an initial open vowel or with an initial high vowel, but because they are the same word, then there is no need to have two separate pages. --Mahmudmasri (talk) 21:59, 17 November 2013 (UTC)

Please see my changes to أبريل and ابريل (see how I transliterated both as well). I made the former the main entry and the latter - an alternative form. Is this OK with you? Even if loanwords are not standardised, these are two possible spellings. --Anatoli (обсудить/вклад) 00:39, 18 November 2013 (UTC)
Yes, it is OK to have it mainly as any of the three: أبريل,‎ ابريل or إبريل. Yes, with an under hamza is also considered correct, but for the high vowel pronunciation for the first syllable [ɪ]. The only problem is that you wrote the Latin transliteration for ابريل without an initial half-ring (ʾ), but with it for أبريل, however they are both pronounced with an initial glottal stop. For practicality and as I was expecting, I didn't want the half-ring (for the glottal stop) be written initially, since it would be harder to use such strict transliteration. Do you still think it is a practical to transliterate the initial glottal stop? --Mahmudmasri (talk) 01:45, 18 November 2013 (UTC)
My idea is (I think it's practical) to reflect the written hamza with ʾ (e.g. ʾabrīl) symbol and show nothing (e.g. abrīl) when it's not written (yes, ignoring how it's pronounced - initial "i" in "إذا" and "ابن" are pronounced identically). Marking hamzas is especially important to show when alif is not elidable. Perhaps (alternatively), all words starting with an alif, except for alif waṣl should be marked with ʾ.
Feel free to create إبريل based on ابريل with transliteration "ʾibrīl". What do you mean "harder to use"? BTW, if you use Firefox, I can give you a hint how to add any symbol quickly in a single tab (but not Arabic diacritics). --Anatoli (обсудить/вклад) 01:58, 18 November 2013 (UTC)
I think what Mahmud is saying is that in ابريل, the initial hamza is not elidable even though it is not written. --WikiTiki89 02:16, 18 November 2013 (UTC)
Yes, I understand this (that's why I said "especially") but what do you suggest? We could add ʾ to all initial alifs, excluding words staring with همزة وصل or only to those where hamza is marked, even if it creates a discrepancy with transliteration of alternative forms. --Anatoli (обсудить/вклад) 02:21, 18 November 2013 (UTC)
Please see أفريقيا with alternative forms. Transliterating all alternative forms is time-consuming but this is an example how I see the use of ʾ when transliterating alifs with or without hamza. --Anatoli (обсудить/вклад) 02:45, 18 November 2013 (UTC)

Arabic vocalization[edit]

For the record, you cannot add compound Arabic vocalizations on Wikipedia or Wiktionary in the usual way. When you save the page, the order of the two diacritics is automatically reversed, which is the incorrect order. If you type, for example, "shadda" + "fatha", then save, the order is changed to "fatha" + "shadda", which is wrong. Some Arabic fonts are able to display this as if it were entered correctly, but most Arabic fonts cannot.

This effect is known as the Hebrew/Arabic vowel/niqqud bug, an unwanted consequence of "normalization": http://bugzilla.wikimedia.org/show_bug.cgi?id=2399

There is a way that it can be done. You have to enter the XML character references in place of the actual vowels: Shadda+fatha = َّ • shadda+kasra = ِّ • shadda+dhamma = ُّ.

There is a template that will do this for you: {{ar-dia}}. Just indicate sha/shi/shu (= shadda+fatha/shadda+kasra/shadda+dhamma). For example, type كتب and then place {{ar-dia|sha}} after the ت to get: كتَّب. —Stephen (Talk) 22:01, 20 October 2012 (UTC)

Proposal to remove stress marks in transliterations[edit]

Arabic dialects differ on where to put the stress in a word. Neither Modern Standard Arabic nor Classical Arabic ever indicated stress even in vocalized texts and therefore have no standard stress. Therefore, speakers of MSA just borrow the stress placement rules from their respective native dialects. Also, misplacing stress is not a big problem in Arabic, as it is in some languages (such as Russian).

Therefore, I propose to remove stress marks in transliterations of Arabic.

Does anyone disagree? --WikiTiki89 01:45, 4 November 2013 (UTC)

Stress marks are occasional. I don't think they cause problem or they are very important either. They do help people who are not familiar with Arabic stress rules, especially when last syllable is long but not stressed, e.g. كبرى (kúbrā). In any case, we don't have current volunteers to actively work with Arabic, so whatever decision, it will not make difference. --Anatoli (обсудить/вклад) 02:03, 4 November 2013 (UTC)
Once we adopt a policy, then we can start slowly making changes. The problem we have now, is that I am afraid to make mass changes if there is no consensus, and consensus is indicated on the WT:AAR page. That is why I am bringing it up. --WikiTiki89 02:46, 4 November 2013 (UTC)
Do you see a problem with having them optional and reword WT:AAR (it says "stress on short vowels can be rendered")? No-one is forced to provide stress marks. --Anatoli (обсудить/вклад) 03:04, 4 November 2013 (UTC)
My proposal here is to remove all stress marks. The main reason, as stated above, being that speakers of different dialects put stress in different places in the same word (even when speaking MSA). Therefore they can be misleading. --WikiTiki89 03:07, 4 November 2013 (UTC)
I'm not aware of different stress patterns. I know that stress rules are easy and predictable. Are you referring to situations where words are stressed differently with full case ending and in pausa (case endings dropped), e.g. مكتبة - "maktábatun" vs "máktaba"? --Anatoli (обсудить/вклад) 03:18, 4 November 2013 (UTC)
Never mind. I found some confirmation (even though it's not very trustworthy) for your claim about dialects - I hardly studied any dialect, so I don't know stress patterns in dialects. No objections from me. --Anatoli (обсудить/вклад) 03:25, 4 November 2013 (UTC)
I was going to say that some speakers would say máktaba while others would say maktába regardless of whether they include the -tun. --WikiTiki89 03:28, 4 November 2013 (UTC)
I don’t want to make any strong objections, since I have stopped editing Arabic after editors who did not know what they were doing began doing severe damage to some Arabic pages. I will just remark that when students start learning Arabic, they want to know where to put the stress. Yes, stress can vary under different circumstances, but nevertheless we can advise people on a good standard placement of stress. Even though many transliterations do not have any stress marked, we should not remove correct stresses that exist or eventually will exist.
Again, as I don’t do the Arabic here anymore and don’t look at the Arabic pages, I don’t want to say much about the transliteration system. But a few years ago, some editor (I think it might have been User:Angr, but I’m not certain) insisted on having a strict and very difficult set of transliteration systems for Burmese, and the result was that nobody but him could enter Burmese words anymore. Even native Burmese could not add those transliterations well, and Burmese has been on the slow track ever since. The system you are proposing for Arabic is, in my opinion, very difficult and unnatural, especially on a computer screen. I can read it myself only with great difficulty, and I would never purchase a book that used this system. Arabic is virtually dead here because of bad decisions and actions by some editors, almost none of whom contribute to our Arabic effort, and I think this just puts the last nail in the coffin.
It’s only my opinion, I don’t edit Arabic here anymore, please do what you want with it. —Stephen (Talk) 04:59, 4 November 2013 (UTC)
Yes. As Stephen writes with no apparent sense of irony whatsoever, it is bad for the readers and editors, really for Wiktionary in general, when someone imposes a difficult, non-standard transliteration system. Michael Z. 2013-11-05 18:01 z
You both say that this scheme is "difficult", but I fail to see what makes you think so. Perhaps one of you could explain it. As I said below, the scheme is still up for discussion and I can't improve it if I don't know what's wrong with it. The reason I didn't use a standard scheme is because I think this one is easier to read, not to mention that it is currently the one we seem to be already using in a plurality of articles. Anyway, I don't care that much myself about what our standard scheme will be, as long as we have one single standard, so that we don't have every entry using a different scheme. --WikiTiki89 18:27, 5 November 2013 (UTC)
I’ll reply in the next section below, since that is about the transliteration. Michael Z. 2013-11-05 21:37 z
Is there anything in particular you don't like about the transliteration? We need to standardize it because it's worse to have a million different transliterations systems than to have a single bad one. What we standardize it to is still up for discussion. Once the transliterations is standardized, we can even use gadgets to display a reader's favorite system.
Regarding stress marks, the point I'm making here is there is no "good standard placement of stress". And we still have an entire pronunciation to deal with things like stress placement that allow us to specify who places the stress where and not just where we think the stress should be.
I'm not trying to scare off Arabic editors, I'm trying to help them by cleaning up our pages, which can hardly be done if we have no policy. --WikiTiki89 13:46, 4 November 2013 (UTC)
  • I don't know of a standardized Arabic transliteration which indicates stress.
  • I saw that there was a common practice in Wiktionary to indicate stress by using the acute accent on vowels.
  • (Not to be confused with Arabic dialects which have different grammar and more distinct pronunciations)
    • I only added stress when most accents of Literary Arabic were pronounced the same, but avoided them or removed them when the stress wasn't the same.
    • Moroccan and Algerian accents of Literary Arabic don't have stress at all.
  • The standardized Arabic transliterations don't necessarily indicate the exact pronunciation, they are halfway between indicating Arabic spelling and the loose pronunciation. Additionally, final alef and alef maqsura are transliterated with ā (a-macron) in most transliterations.

If you want to remove stress, then for the previous reasons, if not, then we should only add it when the word is stressed the same by most Arabic speakers. But I feel that for practicality, we would stop indicating stress in transliteration.

For Anatoli:
مكتبة is pronounced maktába [mækˈtæbæ] or maktábatun [mækˈtæbæton] (with nunation) by Egyptians, while by the Levantine people as máktaba [ˈmaktaba] or máktabatun [ˈmaktabatʊn]. --Mahmudmasri (talk) 21:51, 17 November 2013 (UTC)

Standardizing transliteration characters[edit]

Currently WT:AAR provides many alternative transliterations for each letter. I propose we adopt the following scheme exclusively (there is nothing new here, just no more alternatives):

ء ʾ ب b ت t ث ṯ ج j ح ḥ خ x
د d ذ ḏ ر r ز z س s ش š ص ṣ
ض ḍ ط ṭ ظ ẓ ع ʿ غ ġ ف f ق q
ك k ل l م m ن n ه h و w ي y

Does anyone disagree? --WikiTiki89 01:45, 4 November 2013 (UTC)

They sit there because there was not enough effort to convert all instances of old transliteration to the new method. Your table is my preferred as well but there are too many old transliteration examples still. So, people will wonder what all those capital letters, double vowels, number "3", etc. mean. My preference would be to clean up all entries and translations first, then update but I would leave alternatives somewhere.
With semivowels و (w) and ي (y), it's not correct to simply transliterate them as "w" and "y", as they can be vowels "ū" and "ī". --Anatoli (обсудить/вклад) 02:03, 4 November 2013 (UTC)
You're thinking backwards. We cannot enforce a policy until it is on the policy page, otherwise no one would know about it. If you want the alternatives to remain for the sake of reference, then we can leave them in the table but mark them as deprecated. Regarding the semivowels, yes, I was only talking about the consonantal use of و and ي. --WikiTiki89 02:50, 4 November 2013 (UTC)
No objections, if deprecated letters are provided. This table would be incomplete without mentioning ة (), ى (in the final positions) and vowels. --Anatoli (обсудить/вклад) 03:01, 4 November 2013 (UTC)
Isn’t the proposed transliteration table just a letter or two away from several standardized systems? Any reason why we can’t choose one of the established systems described in w:Romanization of Arabic? What possible advantage justifies creating an incompatible proprietary system? Michael Z. 2013-11-05 17:48 z

WikiTiki89, I don’t think this system is too difficult (by my remark in the section above I was referring to the irony that Stephen complains about non-standard romanization, when he is one of the defenders of a completely non-workable system for Russian romanization).

My concerns are the disadvantages of a novel Arabic romanization system for Wiktionary:

  1. If we amateurs create it, then would comprise original research, likely lacking any firm functional or academic basis. It would be subject to the whims of individual editors.
  2. If we created it, then nothing prevents helpful editors from constantly “improving” it, thereby ensuring that most romanizations entered in the dictionary will remain out-of-date.
  3. We would waste effort developing such a system, debating the relative merits of individual features, when we should be working on the dictionary instead.
  4. If it is novel, then new readers and editors could never benefit from already being familiar with it.
  5. If it is non-standard and therefore incompatible, then readers cannot benefit from our romanizations in any other work. For example, they wouldn’t be able to use them in a professional publication or other work, which would likely require use of standard romanization methods.

Standardized systems have many advantages, as recommended by professional bodies. Some are listed at WT:TRANSLIT#Criteria for romanization systemsMichael Z. 2013-11-05 21:51 z

Also, readers and editors could benefit from familiarity with a system if it were compatible with those for other Arabic-script languages. I don’t know how feasible this is. Michael Z. 2013-11-05 22:03 z
It's funny you mention professional publications, because every linguistic scholarly work I've read on Arabic so far has used it's own invented scheme and was not internally consistent (and I mean that hyperbolically). But otherwise you make some good points. Here's my rebuttal:
  1. Wiktionary allows original research. Also, this is not exactly research.
  2. As long as the "improvements" are indeed improvements, they are welcome.
  3. The system is already there. We don't need to spend any time or money on R&D.
  4. It is not so novel. We already use this system on the plurality of our pages and it is similar enough to many existing systems that it doesn't require any learning.
  5. Who's gonna use Wiktionary as a source in a professional publication? And if they do, there is nothing preventing them from substituting characters for more "standard" ones.
--WikiTiki89 23:37, 5 November 2013 (UTC)
  1. Wiktionary has a community standard threshold for verifying a term, usage, etc. Do any guidelines actually allow or encourage original research beyond that? But as you say, our romanization methods are based on personal preferences, not any research.
  2. I don’t know of any goals or criteria for success ever used in developing our original romanizations, so no one can actually claim that any changes were or were not improvements. They were just changes. If one considers stability to be a beneficial feature, then any unjustified change is harmful, because arbitrary changes make Wiktionary’s contents unpredictable, and therefore apparently not reliable.
  3. I’m not sure what you mean by the system is already there. The “system” on this page has already been there, ever-changing, since April 2009. You are changing it now, and editors would continue changing it ad infinitum. A system like ISO 233 or ALA–LC is “already there” in the sense that you mean, but wiki systems never are.
  4. If it is not the same as a standard system, then it requires not only learning, but possibly also unlearning old habits, and constant alertness to avoid mistakes in the minor differences. It also invites mistakes because an editor may not realize it is different from romanization everywhere else.
  5. If the point of Wiktionary is not to be a reference, then what is it?
By the way, w:Wikipedia:Manual of Style/Arabic uses ALA–LC romanization for Arabic[1] and Urdu.[2] It looks like that only has 5 significant differences from your proposal (ṯ → th, x → kh, ḏ → dh, š → sh, ġ → gh). Is there a good reason not to adopt ALA–LC here, to gain the benefit of compatibility with Wikipedia and world-wide English-language publications and library catalogues? Michael Z. 2013-11-06 04:37 z
It's w:Romanization_of_Arabic#Comparison_table -> w:Hans_Wehr_transliteration as in the dictionary everybody uses, except for two letters "ḵ" (everybody just preferred "x") for "خ" and "ḡ", which was "ġ" in the older versions for "غ". "ǧ" instead of "j" for "ج" was disliked by everybody and Hans Wehr changed it to "j" in the latest editions, which now matches our standard as well. We can update and use "ḡ" and "ḵ", which will match Hans Wehr totally. I don't see Wiktionary to catch up with Hans Wehr in the near future, so it would make much more sense to follow the standard of the most used dictionary. --Anatoli (обсудить/вклад) 04:55, 6 November 2013 (UTC)
Whatever we do, the result should go into Template:ar-root/tr as well, which automatically transliterates Arabic root consonants. BTW, w:Hans Wehr transliteration uses "ʼ" and "ʻ", which are visually harder to distinguish than ʾ ("ء") and ʿ ("ع").
Suggested transliteration of consonants (excluding dialect and loanword exceptions):
ء ʾ ب b ت t ث ṯ ج j ح ḥ خ ḵ
د d ذ ḏ ر r ز z س s ش š ص ṣ
ض ḍ ط ṭ ظ ẓ ع ʿ غ ḡ ف f ق q
ك k ل l م m ن n ه h و w ي y

--Anatoli (обсудить/вклад) 05:15, 6 November 2013 (UTC)

Well, why didn’t someone say it was based on a dictionary? I would encourage just picking an edition and Wehr and following it exactly. But anyway, this is a sensible approach, and I suggest citing Wehr and noting any differences from the published system. Sorry for the involved discussion.
On my machine, the apostrophes ʼ and ʻ are a bit clearer than the half rings ʾ and ʿ owing to their thicker round dot, at my preferred 16px font-size. At Wiktionary’s default 13px font-size, the half rings look rather like identical vertical ticks. Michael Z. 2013-11-06 05:28 z
Half rings ʾ and ʿ are used by DIN, ISO and SAS standards. The printed book look examples look more like semicircles (may double check, I've got the dictionary at home) than apostrophes. I think Wikipedia page may not use the right symbols. I can distinguish half rings better than apostrophes on my PC and iPad and they are closer in shape to IPA symbols: ʿ - ʕ, ʾ - ʔ. The similarity of the symbols made some user prefer IPA or number 3 (used in chat) for ʕ ("ع") in the past. --Anatoli (обсудить/вклад) 05:57, 6 November 2013 (UTC)
I am ok with those changes, since I never really liked using "x" and the shape of the diacritic on the "ḡ" doesn't make much difference to me. I will add these changes to the bottom editing toolbar for easier access to these characters. --WikiTiki89 14:45, 6 November 2013 (UTC)
I've checked my H. Wehr dictionary. "ع" is definitely romanised like ʿ, with "ء", I'm not so sure, it may be an apostrophe but it has quite a round tail, which makes it look more like ʾ (with a round ball on the top) rather than ʼ. I think this info could be requested and Wikipedia pages w:Romanization_of_Arabic#Comparison_table and w:Hans_Wehr_transliteration could be changed to use ʿ and ʾ. --Anatoli (обсудить/вклад) 00:27, 7 November 2013 (UTC)
The appearance of an apostrophe depends on the typeface design. See Google image search for a range. The half-ring varies somewhat, too, having clipped, ball, or pointed terminals. Michael Z. 2013-11-07 01:26 z
The Wikipedia page on the w:Hans_Wehr_transliteration uses ʼ for ء and ʻ for ع, which are neither apostrophes nor half-rings. Based on the amazon previews of the Hans Wehr dictionary, I am fairly certain that it uses the half-rings, but they are slanted since the transliterations are italicized: ʾarbaʿa. --WikiTiki89 01:54, 7 November 2013 (UTC)
Well U+02BC, the “modifier letter apostrophe” ( ʼ ), certainly is an apostrophe. Its difference is the technical coding so that it works like a letter and not like punctuation. For example, on my system, double-clicking a word selects the entire word, leaving out punctuation marks but including “modifier letter” characters:
punctuation: ‘aa ’aa “aa ”aa
modifier letters: ʼaa ʻaa ʽaa ʾaa ʿaa
The marks in Wehr’s transliterations (via Amazon) appear to have the stroke modulation and upper terminals of the apostrophe ( ʼ ) and “reversed comma” ( ʽ ), as opposed to the English quotation mark ( ʻ ). The very definition of a half-ring is that you wouldn’t see this bias referring to manuscript forms. There are entries where you can compare the italic transliterations to roman entry text, but the italics font has different letterforms from the roman anyway.
But that is kind of academic. As a matter of convention, Wiktionary can choose to use either apostrophe or ring typographical forms to represent these characters. Apostrophes are more accessible for text-entry. Readability seems to be mixed – the fact that both suffer on different machines (as we mentioned above) is another argument to increase Wiktionary’s font-size to the standard HTML default. Michael Z. 2013-11-07 17:16 z
Could we have a vote on standardizing the transliteration scheme instead of just going ahead and making changes? --Dijan (talk) 15:07, 8 November 2013 (UTC)
@Dijan Do you have specific concerns? The page has been neglected for a long time and it's not clear what the vote would achieve - voting for specific standard or specific letters/symbols. Transliteration of Arabic is not a simple topic and would require a lot of explaining. Can we resolve all differences here? --Anatoli (обсудить/вклад) 09:09, 9 November 2013 (UTC)
Well, voting on specific letters would be voting against standardization. I’d welcome any input from Dijan too.
But I would also like to see a vote called, because we shouldn’t be changing transliterations ad hoc. Currently there is no clear statement of the transliteration system’s goals, principles, or even the content of the proposal on the table. Questions about specific symbols remain inconclusive. There is little consideration for compatibility with systems for other Arabic-script languages, so perhaps the whole topic would benefit from the wider community’s perspective. Michael Z. 2013-11-10 15:11 z
My concern is that there is already something of a vote happening on this page, but it's being decided on by a handful of people. This page is for discussion of differences and opinions and I'm open to it. Yes, there has been neglect on part of contributors to the Arabic language and our policies regarding Arabic. As Stephen mentioned, it is precisely the rapid changes of the few that drove away the talented contributors. --Dijan (talk) 17:02, 11 November 2013 (UTC)
I'm not sure if standardisation always puts off contributors. A sole native speaker Mahmudmasri (talkcontribs) (not very productive, though) prefers standard transliteration and ZxxZxxZ (talkcontribs) (Persian native speaker but knows Arabic) also seems to prefer standards. In any case, if someone uses non-standard transliteration, it's not such a big deal. The entries can be corrected and users can be referred to this page. You still haven't said, which particular change is a concern, Dijan. --Anatoli (обсудить/вклад) 01:52, 14 November 2013 (UTC)
Which standard or standards are you referring to? Michael Z. 2013-11-14 15:35 z
Hans Wehr, why? In some cases, you can't tell for sure, which one, as symbols coincide. I any case, it's obvious when editors use "ẓuhr" (ظهر), "ʿayn" (عين), "iṯnān" (اثنان), "ġurfa" (or more up-to-date "ḡurfa") (غرفة) rather than standard than "ZHuhr", "3ayn", "ithnaan" and "ghurfa" when transliterating, they try to use standard transliteration. --Anatoli (обсудить/вклад) 22:44, 14 November 2013 (UTC)
  • Why Hans Wehr isn't the best scheme? Because it does not transliterate final ة when it is silent, however that is the common practice I noticed here in Wiktionary. It would only make conflicts if it were used along with the case ending -a, (which I don't prefer for simplification) because final silent ة comes after -a- (-ah). Hans Wehr also does not capitalize proper names nor does it capitalize words at the beginning of sentences. Therefore I feel that Hans Wehr might be easily misused along with capitalized letters or -a case ending.
  • Why DIN 31635 may not be the preferred scheme for some? Because it uses the symbol ǧ (g-caron) for ج rather than a letter with no diacritics. However, the letter ج is normally and standardly pronounced as [ɡ] in Egypt. DIN 31635 has an annoying feature of always transliterating initial glottal stops, which do not add up for etymology and the real life use of those rules aren't that strict by Arabic speakers.
  • Why ALA-LC may not be the preferred scheme for some? Because it uses digraphs, conflicting with instances of a consonant+h and it used for ى (alif maqsura) which erroneously indicates that the final syllable is stressed. However, it makes use of commonly used digraphs by Arabic speakers when writing their names in English (gh, kh, sh, th).

Conclusion: Hans Wehr or DIN 31635 are the most appropriate standards for having no digraphs. No matter what was chosen, I'm totally against using the non-standardized messy numbers (2, 3) and using capitalized letters for emphatic consonants. --Mahmudmasri (talk) 06:32, 16 November 2013 (UTC)

Re: "DIN 31635 has an annoying feature of always transliterating initial glottal stops...". It should be noted that Hans Wehr dictionary doesn't hamzate initial alifs, even if alif is not elidable (ألف (ʾalif or ʾalf) vs اسم (ism)). It's more precise and more common to write initial (including after the definite article الـ (al-)) alifs with hamza: أ (ʾ) and إ (ʾ) vs alif without hamza ا when initial alif is elidable or in the middle or final positions. The transliteration uses ʾ when hamza is written and nothing when it's not written, regardless of the pronunciation. There's no full agreement here whether we should use entries with hamza and alternative forms without. My preference is to use hamzas, as it is more educational for foreigners, common and stricter and have alternative forms entries without (this will help with etymologies, as Persian and other languages don't use initial hamza). I don't quite agree about "real life" - we're not talking about Egypt but majority of Arabic speaking countries. This feature of dropping hamza is similar to writing ى ("ʾalif maqṣūra") in the final position instead of ي (y). Mixing them up causes grief and problems for foreign learners. They are conceptually different and have different pronunciation, users shouldn't rely on transliteration alone. Entries and user examples here should use ى when it is pronounced "ā" and ي (y) in other cases. Again, "real life" is mainly applicable to Egypt, not majority of Arab countries. --Anatoli (обсудить/вклад) 02:27, 16 November 2013 (UTC)
From my limited experience with Arabic speakers online, I have noticed that the ones who write alifs without hamzah, when they should have a hamzah, also tend to write haa instead of taa marbuta. I understand that there are many historical reasons for this to be acceptable, but we are a dictionary and should use as many diacritics as it takes to make clear the morphology of a word. I agree that if the goal of our transliterations was to use them as borrowings in English then initial glottal stops would not need to be written and we would use all the capitalization rules of English, but that is not our goal. As a dictionary, we need to show the morphology of a word so that learners would be able to look up these words and know how to transform them, add affixes to them, or whatever else. Therefore the initial hamzahs are essential, while capitalization is not. --WikiTiki89 03:22, 16 November 2013 (UTC)
For Anatoli:
  1. I didn't specify the use of (Literary) Arabic in real life for Egyptians alone. Do I have to change my five year old username to something else and remove the word masri to stop people from assuming I am biased towards Egypt!? I won't :)
  2. Regarding the redundant initial glottal stop, I was only referring to transliteration, not the original Arabic spelling. I remember reading about Hans Wehr's transliteration that it drops the redundant initial glottal stops from transliteration, as ALA-LC does. The initial glottal stop in transliteration is not only redundant, but also distracting, since it is natural for anyone with limited knowledge of basic Arabic phonology that glottal stops come naturally as a syllable break between vowels unless one of them is elided which is reflected in transliteration: example Eid al-Adha, is pronounced in Classical Arabic ʿīd-u l-aḍḥā.   (ʿīd-u al-aḍḥā) →‎ ʿīd-u l-aḍḥā, which is also transliterated as ʿīd ul-aḍḥā. There's no hiatus in Arabic phonology, that is the same case for all modern spoken Arabic dialects and probably all other Afro-Asiatic languages, with the exception of Israeli Hebrew. The other reason for the redundant initial glottal stop distraction is that it is slower, for learners or people who want to quickly have an idea about how the words are loosely pronounced in Arabic, to distinguish and remember to pronounce an odd pronunciation other than that for ع at the beginning of the word. ʿ and ʾ appear very similar.
--Mahmudmasri (talk) 04:14, 16 November 2013 (UTC)
Mahmud, I don't have any prejudice against Egyptians, quite on the contrary :) but the Egyptian relaxed style of spelling is rather well-known, as it seems, specifically dropping initial hamza, replacing "ي" with "ى" and "ة" with "ه". I didn't even think about you being Egyptian when replied. I've learned it first from the Arabic language forum on Word Reference site, where one of the moderators is an Egyptian woman. The topic of spelling was often discussed. This small controversy is also described at Wikipedia. Most learners agree that the more strict spelling is better, from which you can easily make a more relaxed spelling. Well, hamza is a letter or a graphical symbol, even if it can be stand-alone or sit under or over other letters. And it has a transliteration symbol, e.g. ʾ. If a word is spelled without it, then hamza transliteration symbol can be omitted, otherwise it should be transliterated, IMO. So, عيد الأضحى should be transliterated as "ʿīd al-ʾaḍḥā" (with ʾ) but hamza-less "عيد الاضحى" as "ʿīd al-aḍḥā" (if we consistently ignore classical case endings), even if they're pronounced identically. --Anatoli (обсудить/вклад) 10:08, 16 November 2013 (UTC)
The Egyptian use of hamza in Literary Arabic isn't different from any other people's use. However, when writing informally and in dialects, people don't always care to add hamza above or below alef. Arabic letters didn't have a final ي with two dots underneath. That was a west Levantine creation, more than a century ago. I can even show you modern printed Quran from Egypt having no dotted final ى. So, we don't drop the dots under ى in Egypt or Sudan (also sometimes in Yemen and Algeria), we just still write it as it has always been written for much more centuries. The words that are written with ى but pronounced with an open vowel are very few, the case is to pronounce it finally as /i/ with a few exceptions. The colloquial pronunciation for عيد الاضحى can be /-ladˤħa/, not /-eladˤħa/, and as an example, Tunisian Arabic doesn't even have a glottal stop, so /-alʔadˤħa/ pronunciation is only confined to Literary Arabic for them. See also w:Lakhdar Brahimi, it's not el-Akhdar or al-Akhdar. --Mahmudmasri (talk) 17:07, 16 November 2013 (UTC)
Those may be the dialectal pronunciations, but as a written dictionary, we should focus on the literary language and include pronunciation notes in the pronunciation section. --WikiTiki89 17:27, 16 November 2013 (UTC)
Mahmud. I won't argue, the spelling conventions are a bit irrelevant right now, what it is important is the transliteration. As I said, we should use the strict spelling and possibly provide alternative spelling, like أنا and انا. I know these features are not exclusive to Egyptian and are very old but currently, many translations with final ى for ي (y) are tagged "Egyptian Arabic", which is not exclusive to Egyptian, of course. --Anatoli (обсудить/вклад) 21:35, 16 November 2013 (UTC)
Romanization should reflect the written language (not necessarily only “literary,” as we also document regional, informal, and other vernacular forms). If a word is has different written forms, then we have form-of entries for them, with romanizations that reflect the differences. That romanization reflects pronunciation is secondary, because pronunciation varies, which is documented in our pronunciation section. Michael Z. 2013-11-16 18:14 z
It would be totally wrong to transliterate words written with the final ى as "-ā" when ي (y) is meant (pronounced as ī or y) or ه (h) (hāʾ) as "a" (or nothing) when ة () (tāʾ marbūṭa) is meant in a more relaxed orthography. Lack of dots doesn't make them conceptually different, so غرفة can be spelled غرفه but it's still "ḡurfa" or عربي can be spelled عربى but it's still "ʿarabī" (like in Russian ё (), when it's written as е, it's still "ё" conceptually. The word ёлка (jólka) is commonly spelled "елка" but it's still transliterated as "jólka", not "jélka"). Dictionaries, including Hans Wehr transliterate reflecting pronunciation, not just for cases above but for foreign and dialectal words. Anatoli (обсудить/вклад) 21:35, 16 November 2013 (UTC)
I see your point. Not a good example though. In transliterated Russian we typically see е → e/je/ye/ie and ё → ë/jë/yë/ië, not conceptually, but as a strict grapheme conversion. Michael Z. 2013-11-17 01:00 z
The example is absolutely up to the point. It may be that Roman letter "e" with any diacritics is still the same letter but in Russian, letter "ё" is still conceptually letter "ё", even if "two dots" are not written. It's a letter of the alphabet and when asked about a spelling, it's called "ё". The same way, Arabic "ي" is one letter, "ى" is another but when the latter is used instead of the former, it's still a "yāʾ", not "ʾalif maqṣūra". --Anatoli (обсудить/вклад) 03:42, 17 November 2013 (UTC)
Well it depends how you look at it. An alif maqsura is just a yaa that is pronounced like an alif. Just like Russian "ё" is just a "е" pronounced as /o/. In both cases the diacritics were added later as an afterthought. I do agree, though, that today they are seen as distinct and it make sense for us to treat them that way. --WikiTiki89 04:53, 17 November 2013 (UTC)
Anatoli, your example is incorrect. The letter ё is generally not transliterated jo for Russian. A letter’s conceptual soul is irrelevant if the reader cannot determine what it is. This is so for English-language librarians, who romanize Russian ё as Latin ë, and apparently also for the Russian passport office, which is using e/yeMichael Z. 2013-11-17 17:04 z
We are a dictionary not a library or passport office. Also, I don't see how Latin "ë" is different from "jo", the distinction is made in both cases. And this is all irrelevant anyway, because I don't see anyone suggesting that we transliterate alif maqsura as "ī" or "y". --WikiTiki89 20:17, 17 November 2013 (UTC)
If it’s irrelevant, then why did you bring it up? Latin ë is different from jo because the word is transliterated elka or ëlka based on how it is written in a text, not on how it is pronounced. When the “yo” is written as е, then it is transliterated e, not ë, and certainly not jo. Russian transliteration is performed on a graphical е or ё, not a conceptual ё.
If you see some other precedent in dictionaries, then I am interested in learning about it, but generally monolingual or bilingual dictionaries do not transliterate.
Without agreeing on any principles as a basis for transliteration, this is all just chit-chat about everyone’s favourite Latin character for a particular foreign character. Michael Z. 2013-11-17 22:31 z
I think I see what you are saying now. But it is still irrelevant because our entries always use ё when appropriate. Similarly, our main entries will always use ي when appropriate. The only time this is relevant is in alternative forms, which should transliterate based on what is implied. Keep in mind also that when Arabic text is fully vocalized, as is ours, then final ي and ى are differentiable based on the vowels, even when both are written as ى. So the transliterations are still following rules that do not require knowledge of the word (as long as the vowels are known). --WikiTiki89 00:44, 18 November 2013 (UTC)
The only time this is relevant is in alternative forms, which should transliterate based on what is implied – that is wrong. The purpose of transliteration is to represent the form. In Russian, for example, transliteration represents Cyrillic spelling via Latin letters. For someone who doesn’t read Cyrillic, the transliteration is the only way to understand the difference between an alternative form and the lemma form. our entries always use ё when appropriate – Except for the ones that don’t, like alternative-form entries. And except for our citations of real usage, which quote original texts. And except for any notes or other text that refer to actual usage. Michael Z. 2013-12-04 20:52 z
There is no benefit to a transliteration system that fully distinguishes every nuance of the original spelling. If someone is willing to learn such a system, they are better off learning the original alphabet. The purpose of transliteration here is to aid readers that don't care enough about the language to learn the alphabet, and occasionally to disambiguate what is not written in the original script. We have combined these two purposes together, because in almost every case they are not mutually exclusive. --WikiTiki89 21:04, 4 December 2013 (UTC)
We are getting off topic. We really need to have a wider discussion about the principles behind our transliteration systems in an appropriate place, because editors share mutually exclusive views.
There is no benefit to a transliteration system that fully distinguishes every nuance of the original spelling. – there is a benefit in representing the 33 letters of the Russian alphabet in such a way that two different spellings can be recognized such. I would say this requires less nuance. There is a benefit in transliterating three related East Slavic languages in a compatible way, for readers that don’t know them, or that don’t know Cyrillic at all. There is a benefit in transliterating in a way that is used by hundreds of other publications.
to disambiguate what is not written in the original script – do you mean to represent pronunciation? We have do have both pronunciation and transliteration. Michael Z. 2013-12-04 21:54 z

If we transliterate Arabic letter to letter, what's written, then we shouldn't have any transliteration at all and refer users to the alphabet page - Appendix:Arabic alphabet (it's missing diacritics and additional symbols).
  • Letters و and ي can be either consonants w/y or vowels ū/ī. In loanwords they can also represent u, o/i, e.
  • ى when used in place of ي is still conceptually ي and should be transliterated accordingly and that's what dictionaries do. (yes, it's the same with Russian ё/е)
  • غ can be used to transliterate /g/ into Arabic and can be either ḡ or g.
  • Dialects and loanwords don't follow the standard way of reading letter.
I won't list all situations, most of them have been addressed already. In any case, phonetic romanisation is in no conflict to standard dictionaries or textbooks. Hans Wehr romanises words phonetically, reflecting pronunciation, not spelling. The participants in Module_talk:ko-translit/testcases all agreed that we need transcription type of transliteration, not letter-to-letter, which is useless, like transliterating English words into Cyrillic or Arabic one to one.
Importantly, Arabic may not be always written with diacritics (as is usually the case in the real world) in user examples, see also's, synonyms, translations and even entries. So, if we omit unwritten vowels in transliterations, we get absolute rubbish!--Anatoli (обсудить/вклад) 22:22, 4 December 2013 (UTC)

The result of the discussion[edit]

There seems to be an agreement to use Hans Wehr transliteration for Literary Arabic when possible. So, let's do it. --Mahmudmasri (talk) 21:20, 17 November 2013 (UTC)

Yes, Mahmud, let's do it. Are you OK with having dotted final yāʾ, tāʾ marbūṭa and initial hamza terms as the main entries and dotless/hamza-less terms as alternatives? E.g. عربي - main entry, عربى - alternative form (currently Egyptian only)?
There remain some details about of loanwords and dialects. (since the use of hamza in loanwords is not standardised, we can transliterate it if it's written and omit if it's not, very easy - the hamzated/hamza-less could have alternative forms, there is absolutely no problem with this - أفريقيا, افريقيا and إفريقيا can all have entries, one being the main entry, the others alternative)
Vowels (long and short in loanwords), dialects. The common practice, Hans Wehr transliterates loanwords the way they are pronounced by Arabs, not the way they are spelled. Long vowel letters often (but not always) pronounced short. Use of o, e, ō, ē, which are absent in Classical Arabic.
Consonants, as above, need to mentions exceptions for loanwords and dialects, which will not match the table. We also need to include Maghrebi special letters ڢ and ٯ, which are not covered. We need to explain why some consonants are transliterated differently in loanwords or dialects, such as "ج" or "غ" as /g/. Include all dialectal letters and rarely used, such as Persian پ (p) or ڤ (v), etc.
Re: tanwīn. We usually don't transliterate classical case endings, you seem to have no issue with this but we can make exceptions for special templates, which show inflections and transliterate them - un, -in, -an, etc.
For verbs we use full endings, i.e كتب is "kataba", not "katab". There are existing templates and module is being developed (very slowly). (You could join and help there!) --Anatoli (обсудить/вклад) 00:01, 18 November 2013 (UTC)
The maghrebi letters ڢ and ٯ are just their traditional way of writing ف and ق. They are the same letters, just dots are used in different ways to differentiate them. They should be transliterated identically. --WikiTiki89 00:44, 18 November 2013 (UTC)
Yes, I know, we just need to include them for consistency.
Another rule to include: we ignore assimilation of "الـ‎" before "sun letters", so الشمس is "al-šams", not "aš-šams". --Anatoli (обсудить/вклад) 02:16, 18 November 2013 (UTC)
Is it possible to transliterate Persian, Urdu and other Arabic-script languages in a compatible way? Michael Z. 2013-11-17 22:42 z
There is no point in making Persian similar to Arabic. Letters س‏, ث‏ and ص are all pronounced /s/ and letters ز‏, ذ‏ and ظ are all pronounced /z/, ع is usually ignored (like in European languages, Persian lacks the Arabic sound produced by ع but preserves Arabic spelling in loanwords). Vowel length is different, where Arabic uses ū and ī, Persian uses u and i.
Urdu is usually romanised similar to Hindi using symbols, like ś (instead of š) and special symbols for language specific letters, not based on the script but based on its relationship with the sister language Hindi. It's best to ask this question on appropriate pages, asking users who work with these languages, such as Dijan, ZxxZxxZ and others. These languages use different from Arabic standards (or non-standard but accepted Wiktionary way, which may include a number of standards). --Anatoli (обсудить/вклад) 00:01, 18 November 2013 (UTC)
  • Yes, the main entry as ي and ى as an alternative.
  • Yes, the main entry as ة and ه as an alternative.
  • Yes, for the alef, however, Africa is also alternatively pronounced ifriqya (the second vowel is either short or long). I even remember that the word Africa has its etymology for some tribe called ifri so maybe that's why it is alternatively with the إ.
  • The custom for transliterating vowels in loanwords, mentioned by Anatoli, is absolutely OK. In fact, I usually find mistakes in Wiktionary and correct them in that manner.
  • The Maghrebi traditional alphabet are used rarely nowadays, so no need to mention them, unless you are writing a page specifically about the letters. ٯ /q/ doesn't have a dot when written finally, but has the confusing dot above ڧ when written initially or medially. The ف /f/ was traditionally written with a dot underneath ڢ.
  • The ڤ /v/ is problematic, because it conflicts with ڨ /ɡ/ used in Tunisia and Algeria, They use ڥ instead for /v/. But, we would still have to provide both alternative forms if the word in question is known to northwest Africans. ڤ is the main entry and ڥ is the alternative form.
  • Anatoli's suggestion about the use regarding ج and غ or others for /ɡ/ is fine, in fact that was the custom I noticed in Wiktionary. Side note: Arabic transliterations which were initially intended to transliterate /ɡ/ in Arabic alphabet are very often distorted to have the pronunciation [ɣ~ʁ].
  • If the case ending was intended to be written, it's better to transliterate it with a dash to avoid confusion, example: katab-a, not kataba. This style is accepted in ALA-LC and DIN 31635. I'm not sure about Hans Wehr.
  • There is a popular transliteration I noticed for Persian (UniPers) which uses circumflexes instead of macrons, but some long vowels are transliterated with normal Latin letters without any diacritic. Unfortunately, it may be difficult to standardize transliterations for Arabic-based scripts, although we can make them as close as possible, but we would deviate from the known standards to transliterate each language.

--Mahmudmasri (talk) 02:21, 18 November 2013 (UTC)

Hans Wehr transliterates nouns in their pausal forms and verbs with full ending, with no hyphens (kataba, not katab-a).
We can handle Maghrebi letters later, they're minor issues.
Loanwords with ج and غ could be transliterated with alternative forms, e.g. "injilīzi, ingilīzi" or g, ḡ with various pronunciation, reflecting intended, real or regional accents (not sure if all terms with ج should have Egyptian "g", probably not, since this is expected, another issue is ق pronounced ʾ in dialects).
No issue with إفريقيا, see hamza topic above, we can have alternative forms and variant transliterations. --Anatoli (обсудить/вклад) 02:34, 18 November 2013 (UTC)
In dialects that normally pronounce ق as /ʔ/, still pronounce it as /q/ when speaking MSA, so ق is not an issue. Also, you forgot to mention the use of ك (or even گ) for /g/. --WikiTiki89 02:46, 18 November 2013 (UTC)
Yes, I haven't used all examples, just the main ones, the same for Persian/Urdu letters used in Arabic occasionally and dialectal letters, e.g. to render "č" in some dialects.
Please see أفريقيا for treatment of hamza, long/short vowels in loanwords (this discussion is split, there is a dedicated hamza topic above). --Anatoli (обсудить/вклад) 02:54, 18 November 2013 (UTC)
  • إنجليزي (A-N-Ǧ-L-Y-Z-Y) / إنكليزي (A-N-K-L-Y-Z-Y) should only be transliterated as ing(i)līzi since it is only pronounced with /ɡ/, not /(d)ʒ/ or /k/.
  • افريقيا (regardless of the first vowel), has its /q/ pronounced the same in dialects or approximated to [k].
  • I really didn't like the transliteration of the initial glottal stop. It's impracticably hideous. Look how many transliterations we have for أفريقيا!
  • In Egypt, it is acceptable to use ج (with 1 dot) for /(d)ʒ/. There are some loanwords with that consonant and are either written with چ (with 3 dots) or ج (with 1 dot).
  • The distorted loanwords that acquired a pronunciation, should be transliterated with to reflect the pronunciation, as for Ghana غانا ḡāna [ˈɣæːnæ] (Egyptian pronunciation) [ˈʁɑːnɐ] (Persian Gulf pronunciation), but it is pronounced with its original /ɡ/ by more educated people in Lebanon and northwest Africa.
  • The ending vowels
    1. Should be transliterated as long (ā, ī, ū) for words of Arabic origin, or non-Arabic words with acquired Arabic case endings.
    2. We shouldn't transliterate final vowels as long even if they are spelled with ا,‎ و or ي in plain loanwords. That rule should also apply when transliterating Arabic dialects. Therefore, أفريقيا should take a final plain-a without a macron.

--Mahmudmasri (talk) 03:49, 18 November 2013 (UTC)

  • إنجليزي - OK but some people wanted to mark /(d)ʒ/, perhaps in some areas it's pronounced so or by less literate people? It may be less obvious with less common words that "ج" should be "g" if the original foreign word has /g/.
  • افريقيا - no problem with that, we are aware that ق doesn't always change the pronunciation, notably القاهرة (al-qāhira, Cairo) (providing an example for the sake of any other participant), in any case, MSA is the target, its pronunciation takes priority over dialects.
  • غانا OK but as far as I know, غ is also used to render /g/, which should be transliterated accordingly or have comma-separated variants. Not sure if we need to discuss all cases, suffice to say that we agree to transliterate exceptions to render the transliteration accordingly, even if individual letters don't match our table - special notes is already there, we can expand it with short details.
  • Long vowels probably OK with me but I'll have to think about it. Final unstressed "ā" consistently loses length.
  • Why do you dislike the transliteration of the initial glottal stop? Is it because ʾ is missing on the keyboard? You don't have to transliterate alternative, derived forms, synonyms, etc. I only provided examples of how to transliterate the actual entries. Let me think about hamza problem. --Anatoli (обсудить/вклад) 04:08, 18 November 2013 (UTC)
  • إنجليزي is always pronounced with /ɡ/ even by less educated people.
  • The case for final vowels in unstressed syllables is the same for all other vowels, not just ā.
  • غ can be used to render /ɡ/ mainly in the Levant and if it is (still) pronounced /ɡ/, it should be transliterated with the normal g.
  • The problem isn't with keyboards! Initial glottal stops are redundant and don't add any etymological information, since transliterations' primary role is to ease etymological inspection and when the words are borrowed in other languages, the glottal stop rules are totally ignored. If someone wanted to know whether the Arabic word is spelled with an initial hamza, then he should check the Arabic spelling.
  • Literary Arabic rules are simple:
  1. initial a- u- are always written with a hamza above.
  2. Initial i- has two rules:
  • is written without any hamza as انتخاب intiḵāb "an election"
  • or with an under hamza in other words as إنتاج intāj "a production".
In loan words, you are free to choose one of the two initial i- rules. Some teachers advise not to use an initial under hamza for loanwords with initial i-, claiming that اللغات الأعجمية al-luḡāt al-aʿjamiyya don't have glottal stops. This is however a shallow badly informed statement. Remember, إفريقيا is a loanwords, but it is spelled with an under hamza for the i- pronunciation.

--Mahmudmasri (talk) 04:49, 18 November 2013 (UTC)

What about الـ (al-)? More importantly, the word الله (allāh), which has a hamzat al-waṣl? Are these the only two words, which are always written without a hamza over alif and start with an "a"? --Anatoli (обсудить/вклад) 05:03, 18 November 2013 (UTC)

If there are no objections from others, I'm OK to drop the requirements for initial hamza transliteration, this transliteration will match Hans Wehr but Hans Wehr doesn't use initial hamza in Arabic words but we do. --Anatoli (обсудить/вклад) 05:11, 18 November 2013 (UTC)

Yes, exactly, the only few exceptions are the definitive article and allah, since both are reduced when pronounced after words ending with vowels, whether it is because of case endings or just nouns ending with vowels. But, you should notice that words with hamzat waṣl should be pronounced properly in Literary Arabic without a glottal stop, if preceded by a word ending with a vowel. Example: /al.in.ti.ˈxaːb/ vs /al.ʔin.ˈtaːɡ/. But, as I told you, in the real life use by all Arabic speakers, people confuse these things or pronounce a glottal stop in all these cases. --Mahmudmasri (talk) 05:15, 18 November 2013 (UTC)
No worries. Ha-ha, you used the Egyptian pronunciation: /al.ʔin.ˈtaːɡ/ (not that it matters much) ;) --Anatoli (обсудить/вклад) 05:24, 18 November 2013 (UTC)
Hamzat wasl can occur with any vowel. As far as I know, /i/ is the most common, /a/ is only in the definite article (the first part of الله is the definite article), and /u/ occurs only in the imperative of Form I verbs that have /u/ as the characteristic vowel of the imperfect (such as اُكْتُبْ). --WikiTiki89 13:59, 18 November 2013 (UTC)
You are right, I missed that imperative, as I was only thinking of the common usage in Literary Arabic. However, the word allāh is a typical example for an exception rule. It isn't treated as a definitive article added to lāh, but one proper noun whose initial vowel can be elided as I explained earlier. --Mahmudmasri (talk) 19:21, 18 November 2013 (UTC)
الله (allāh) is not an exception. It is ال (al-) + إله, with a semi-regular elision of the glottal stop (الإله was later re-introduced). The same thing happened with ال (al-) + إناس (ʾinās), which produced الناس (an-nās) (from which ناس (nās) is a back-formation). It also happened in multiple other cases but those are the only two I remember. --WikiTiki89 19:41, 18 November 2013 (UTC)
Now you are speaking about etymology, not how the words are used. الإله is used to mean "the deity (any generic god)", but الله means "God" (the god in Islam and Abrahamic religions). --Mahmudmasri (talk) 20:50, 18 November 2013 (UTC)
Either way, it acts like the definite article in every way in terms of elision and the alif is dropped when prefixed with لِـ (li-). --WikiTiki89 22:28, 18 November 2013 (UTC)
Thanks for the اُكْتُبْ example, Wikitiki89. That reminds me we'll need to make sure templates and modules don't add hamza on those forms.
I need to clarify re long unstressed vowels. So, does it mean that we need change final ī to i, ā to a, ū to u? It seems OK with ā and ū but nisba ending "-ī" (formerly - "-iyy") seems very unusual if we change it to "-i", e.g. عربي becomes "ʿarabi". --Anatoli (обсудить/вклад) 01:20, 19 November 2013 (UTC)
I think Mahmud was only referring to final long vowels in loanwords. --WikiTiki89 01:49, 19 November 2013 (UTC)
  • I have an impression that the common practice in Wiktionary gives a different impression about how Literary Arabic is used. The ending ʿarabiyy isn't false, but it's very unusual to hear someone pronounce it that way. The example عربي, with the template, {{ar-nisba}}, makes strong emphasis on the nunation, when in fact that over-use of nunation is Classical/poetic. Therefore, the line demonstrating the adjectives should be something like that:
  • عَرَبِي • (ʿarabī) m, dual masculine عربيان (ʿarabiyyān), feminine عَرَبِيَّة (ʿarabiyya)‎, dual feminine عربيتان (ʿarabiyyatān), oblique masculine plural عربيين (ʿarabiyyīn), masculine plural عَرَبِيُّون (ʿarabiyyūn)‎, feminine plural عَرَبِيَّات (ʿarabiyyāt)‎
  • Answering how to transliterate the ending ي in words like (ʿarabī), I suggested earlier in that discussion, to transliterate it with ī-macron, for the Arabic case ending, but if that final ي was a part of a loanword or a non-Arabic name, then we should transliterate it with normal-i. But, looks like that would introduce complications, so let's only transliterate all final vowels in unstressed syllables with un-diacriticized letters. That can also simplify the automation in {{ar-nisba}} to just add endings based on ʿarabi/-yyān/-yya/-yyatān/-yyīn/-yyūn/-yyāt.
  • By the way, the final /j/, as in the word رأي /raʔj/ (one syllable) should be transliterated as raʾy not raʾī, because the second gives an impression of another pronunciation /ˈra.ʔi(ː)/ (two syllables). However, the second pronunciation appears in colloquial pronunciation in the Levant.

--Mahmudmasri (talk) 05:22, 19 November 2013 (UTC)

I disagree. Our transliteration shouldn't worry about how templates will be written. With Lua, we can now create more complicated templates such as {{ar-prep-auto}}. As for ʿarabiyy vs. ʿarabī, our transliterations should not accommodate too much for modern pronunciation. We should try to transliterate what is written, and for nisba, that is ʿarabiyy(un). Our pronunciation sections, or even this page itself (Appendix:About Arabic) can cover the fact that it is pronounced simply as /ʕarabi(ː)/ by most speakers. --WikiTiki89 05:33, 19 November 2013 (UTC)
  • Simplifying for the automation would be a coincidence, not the primary reason, because I know that templates can do a lot more complicated conversions. I don't want my reasoning be distorted. No accommodation here!
  • Kasra plus ي don't give iyy, but give ī. I noticed a confusion which assumed that this case should be iyy.
  • Kasra plus ي plus šadda give iyy.
  • Here comes the question again: Should we demonstrate some fantastic language which would only exist in Wiktionary in that case, or should we be realistic in our transliterations? Apparently, popular standardized transliterations go for the ending ī or i.
  • Is it really a pure transliteration or halfway between a transliteration and a transcription? Can there be a scheme to reflect both of the original Arabic spelling and the loose pronunciation? All the popular standardized Arabic transliterations fail to reflect the original spelling in many cases and sometimes fail to reflect the loose pronunciation, so you shouldn't over worry about the need to reflect the original Arabic spelling, since it is practically impossible. The final ى (alef maqsura) is transliterated as medial or final alef: ā. ALA-LC transliterates it with an accute accent á mistakenly suggesting that its syllable is stressed. The Spanish Arabists School does a better job by transliterating it as à. Silent ة is either not transliterated in some schemes or transliterated in others as final ه by h, but when pronounced as /t/, it is transliterated as t conflicting with ت, ISO 233 transliterates it as all the time, ISO 233-2 transliterates it all the time as ŧ, not distinguishing its silent from /t/ pronunciation. However, almost all schemes, including the aforementioned, have many exceptions, like when Arabic letters are used to render different consonants from what they normally do (ج ك غ ق), additionally when normal Arabic letters may or may not be diacriticized to render other consonants from their normal values (چ گ ڠ ڨ ڤ ڥ) vs (ج ك غ ق ف).

--Mahmudmasri (talk) 22:25, 20 November 2013 (UTC)

I did not mean to distort your reasoning, that is just the way I understood it. I don't who was confusing kasra + yaa with kasra + yaa + shadda. I would support showing ة with the optional case ending, for example كَسْرَةٌ would be kasra(tun), but I think that something like kasraẗ looks really bad. I agree that loanwords will always be exceptions no matter what scheme we come up with. I would also support transliterating alif maqsura as something else, such as "à". --WikiTiki89 00:22, 21 November 2013 (UTC)
It's impossible to re-transliterate perfectly back into English, so I would stick to Hans Wehr, i.e. ة as "a" (i.e. nothing, because it's preceded by fatḥa) or e, i in Levantine dialects. (For declension templates, we could use full endings) and ى (alif maqṣūra) as ā. --Anatoli (обсудить/вклад) 22:34, 4 December 2013 (UTC)

Prefixes and Suffixes[edit]

I propose we use the ـ character to indicate prefixes and suffixes. For example, move the entry ال (al-, the) to الـ (al-, the) and ني (me) to ـني (me).

Does anyone disagree? --WikiTiki89 01:45, 4 November 2013 (UTC)

I disagree. "ـ ()" (tatweel) could be used in the display, like this: الـ (al-), similar to the way ḥarikāt (vowel points) are added to the headword but not part of the entry name. --Anatoli (обсудить/вклад) 02:03, 4 November 2013 (UTC)
Could you maybe give a reason for disagreeing? We use some sort of horizontal line symbol (-, ־) in the entry name for every other language. --WikiTiki89 02:11, 4 November 2013 (UTC)
Tatweel or kashida is only used as the elongation symbol, giving only a different visual effect, for readability, it has no grammatical, lexical, punctuation or other value. There's no tradition in any Arabic dictionary to use ـ (), unlike "-" (hyphens) in prefixes or suffixes in European languages, ـ () is never used so. Even in the header, ـ () in الـ (al-) is used to show it can be attached to nouns and romanised as "al-". --Anatoli (обсудить/вклад) 02:19, 4 November 2013 (UTC)
I suppose I am just too used to the other languages indicating prefixes and suffixes. To me ال (al-) looks really weird, since the prefix is never actually found with the final form of the ل. --WikiTiki89 03:53, 4 November 2013 (UTC)
Arabic dictionaries, notably w:Hans Wehr, don't use tatweel in the article headers.
I've sent you an email. You've got my support in this work. Don't worry about Mzajac, he's not going to work with Arabic, anyway but will surely stir enough trouble with negative comments to discourage any participation. I'll probably refrain from further comments here. --Anatoli (обсудить/вклад) 22:21, 5 November 2013 (UTC)
Ok, I will withdraw the tatweel for affixes proposal. And I never worry about negative comments. One thing you have to remember is that the transliteration system is not nearly as much for editors of Arabic as it is for readers. The readers' opinions should always be more important the editors' and just because Michael Rabbit doesn't edit Arabic, doesn't mean he doesn't read our entries. --WikiTiki89 23:23, 5 November 2013 (UTC)
That's the idea, to make it useful for learners. The chat version of transliteration was added when an Arabic learner (Beru7) was active. I have been correcting it, when he left to standard. A few letters come from different standards and there always was a conflict regarding certain letters, like "j" is preferred by most over "ǧ" and "x" over "ḵ" or "ḫ" (Mahmudmasri must have changed it back to "ḫ"). Otherwise the current standard (or practice) is very close to Hans Wehr's system, the most known Arabic dictionary. Actually, most users have no choice but learn that system, there's no other dictionary as comprehensive as this for English speakers.--Anatoli (обсудить/вклад) 23:37, 5 November 2013 (UTC)
Actually, the 4th edition of Hans Wehr has changed "ǧ" to "j", "ḫ" to "ḵ" and "ġ" to "ḡ". I welcome "ǧ" to "j" change and the other two can be discussed. Note that dialect transliteration and foreign words differ from expected. ج (j) is pronounced as "g" not only in Egyptian but in many loanwords, which came into MSA via Egyptian. --Anatoli (обсудить/вклад) 23:45, 5 November 2013 (UTC)
I like the first two changes ("ǧ" to "j", "ḫ" to "ḵ"). --Z 14:46, 10 November 2013 (UTC)

To Anatoli: I know that you may prefer x, since it is not a diacriticized letter and also looks like a similar letter with the same of similar pronunciation for each of, Cyrillic, Greek and IPA letters, however, I was sticking to published standard transliterations, not made up transliteration, because if we would do what we see fit the best, why your changes would be better than what I myself see more fit for the real pronunciation by Arabic speakers.

To WikiTiki89: No, I definitely disagree with using a substandard elongation to separate prefixes or suffixes. Hebrew spelling is similar, but they are never separated, however, we can separate them by dashes in transliteration. You also have to notice that when using elongation, it makes it difficult to search for words through Wiktionary, since we must write the words with the exact same elongation chosen by the editor, or else the search engine would consider them totally different words. If you wrote in the search الكتاب, you will never find in the results for the word if it existed as الـكتاب. --Mahmudmasri (talk) 23:03, 15 November 2013 (UTC)

I did not mean to connect the prefix or suffix to the word with a tatweel, but to move articles about the prefixes or suffixes themselves to a page with a tatweel. For example, in English we do prefixes and suffixes like this: un-, -ly. I was proposing that for Arabic we would move the ال to الـ and كم to ـكم. But Anatoli already convinced me that it is not a good idea because most Arabic dictionaries don't do this. --WikiTiki89 23:15, 15 November 2013 (UTC)
OK, I also agree to leave them. When we write these separate prefixes and suffixes by hand, we have the option to use the elongation, but it is not a must. --Mahmudmasri (talk) 23:31, 15 November 2013 (UTC)

I think al- should show assimilation and elision[edit]

@Atitarev: @Wikitiki89: I disagree with the current statement about Romanization that "ال always gives al- regardless of elision and sun and moon letters rules". In any case I've largely been ignoring the part about elision when editing transliterations (because of not realizing until just recently that this was the convention), and I think that assimilation to sun letters should also be shown. Hans Wehr's dictionary, which we largely follow the conventions of, definitely shows assimilation (an-nūr not al-nūr). For beginners, I don't think it much helps to have transliterations like ṣabāh al-nūr or bi-smi llāhi al-raḥīmi al-raḥmāni that don't reflect pronunciation very well (vs. ṣabāh an-nūr or bi-smi llāhi r-raḥīmi r-raḥmāni). Benwing (talk) 23:29, 2 November 2014 (UTC)

I don't have a strong opinion about this. Both ways are used. I don't mind if you implement assimilation of "l" and the vowel elision. Interesting that in dialects where ج is pronounced as /ʒ/, not /dʒ/, the letter is a Sun letter, so assimilation happens in الجَزِيرَة (al-jazīra). --Anatoli T. (обсудить/вклад) 23:45, 2 November 2014 (UTC)
I think it should not be indicated for the reason Anatoli gave. It's a pronunciation detail, not an orthographic detail. —CodeCat 23:47, 2 November 2014 (UTC)
The Korean transliteration module uses a lot of assimilations, which is standard and described by an authority. As I mentioned, both ways are possible. Qur'an, which is more likely to have transliterations, is transliterated either way - with or without assimilations and elisions. Elisions may be even more important to get the right number of syllables and for rhyming. As for my example, it's fine to focus on MSA, Classical or Qur'anic Arabic, despite the fact that modern realizations of standard Arabic differ by regions. /dʒ/ is a prescribed and classical pronunciation of ج (j) (although it may not be original). --Anatoli T. (обсудить/вклад) 00:01, 3 November 2014 (UTC)
Elision can also be marked with the existing diacritic ٱ (hamzatu l-waṣli) but it's seldom used and causes some issues with display, e.g. ٱلله (llāh). Also the kasra on للهِ causes the ligature لله to be displayed as separate letter, compare with الله (allāh). --Anatoli T. (обсудить/вклад) 00:30, 3 November 2014 (UTC)
In response to CodeCat, writing assimilation is an orthographic feature, because it's indicated in vocalized Arabic. Words like an-nūr are written اَلنُّور in fully vocalized texts, with a shadda diacritic over the nūn (n) indicating its pronunciation as a geminate consonant, and no diacritic over the lām (l) indicating its silence. A word like al-būr without assimilation is written اَلْبُور when fully vocalized, with a sukūn over the lām (l) indicating no vowel, and no shadda over the bā' (b). Writing assimilation is the same as not writing the trailing silent alif in the 3rd-plural verb ending or in the accusative ending -a, and not writing the trailing silent wāw in name عَمْرو (ʿamr) -- in these cases, the vocalized Arabic again attempts to indicate the silence of these letters. We also follow pronunciation in the writing of ة (tā' marbūṭa), which gets written either as nothing or as t depending on pronunciation, and in writing صلوة as ṣalāh following pronunciation even though it has a wāw in it, meaning an orthographic transliteration would be something like ṣalwa. This is consistent with the practice of Russian, where e.g. the genitive is written -ovo rather than -ogo.
@Atitarev:: The assimilation of ج is a dialect feature that is (at least for many dialects) optional and less found in borrowed MSA words, so I suspect the tendency is not to pronounce it as assimilated when speaking MSA. Words like al-jazīra would of course be written without assimilation, following normative pronunciation and the spelling of vocalized Arabic texts. Benwing (talk) 01:59, 3 November 2014 (UTC)
Very good example with full vocalization - اَلنُّور, I didn't think of it. --Anatoli T. (обсудить/вклад) 03:38, 3 November 2014 (UTC)
Support for the reasons given by Benwing (which I believe I have myself expressed in past). --WikiTiki89 03:39, 3 November 2014 (UTC)

Some changes[edit]

@Benwing: Hi, I've added some stuff about ʾiʿrāb endings un/in/an, additional, rare letters - incomplete, pls. check. What's the alternative form of ق used in Maghrebi Arabic? --Anatoli T. (обсудить/вклад) 03:55, 5 November 2014 (UTC)

RFM discussion: February 2014–January 2015[edit]


The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, though feel free to discuss its conclusions.

Merge North Levantine Arabic ("apc"), South Levantine Arabic ("ajp"), and Syrian Arabic ("sem-syr") into Levantine Arabic

See also: User_talk:Stephen_G._Brown#Merging_Levantine_Arabic_dialects_into_one_language

There is absolutely no reason to have these as separate languages. Yes, it's true that there are some pronunciation differences between various dialects, but the vocabulary is largely the same (except that the Lebanese dialect uses a lot of French borrowings and the Israeli dialects use a lot of Hebrew borrowings). The grammar is identical. But the most compelling reason is that the divisions we have are completely arbitrary and there is much more variation within these regions than there is between them (not to mention that Syrian Arabic is no more than a sub-dialect of Northern Levantine, seemingly leaving Northern Levantine to refer to just Lebanese). --WikiTiki89 22:41, 19 February 2014 (UTC)

See also Category talk:Syrian Arabic language. --WikiTiki89 23:15, 19 February 2014 (UTC)
I don't know enough about it to say whether apc and ajp should be merged, but sem-syr should definitely be abolished and merged into apc. —Aɴɢʀ (talk) 16:08, 20 February 2014 (UTC)
I was actually very surprised to find that there are two separate ISO codes for North and South Levantine. They are about as different as Bostonian vs New York English. --WikiTiki89 22:54, 20 February 2014 (UTC)
I wonder what the code sem-syr was intended to represent. There is more variation between one region of Syria and the next than between one region of Syria and a neighbouring region of Iraq or Lebanon. One textbook (Sociolinguistics / Soziolinguistik, 2006, ISBN-13 978-3-11-0184181) sums it up by saying "In terms of dialectal variation, three major Arabic dialect groups are represented within Syria. In the north eastern region (bordering Iraq) dialects of the Mesopotamian group are spoken, and the rest of the Syrian desert in the east is of Najdi Peninsular Arabic type. In the rest of the country varieties of North and South Levantine Arabic are spoken, which can be distinguished from one another by a number of features. In the south western corner (bordering Jordan), a type of southern Levantine (commonly known as Haurani) is spoken..." WP suggests that "Levantine Arabic" is likewise composed of rather more than two mutually intelligible dialects, making our binary split curious. - -sche (discuss) 01:18, 21 February 2014 (UTC)
North and South are probably the most distinctive difference in the Levantine dialects mostly due to the pronunciation of the long ā. In the North, it's realized as [æː~ɛː~eː] in unemphatic contexts and [ɑː~ɒː~ɔː] in emphatic contexts, while in the South, it is realized as [aː] in unemphatic contexts and [ɑː~aː] in emphatic contexts. There are also some minor vocabulary differences such as the North preferring شو (šū) for "what?" and the South preferring إيش (ʾēš). But these hardly constitute enough of a difference to treat them as separate languages. And as I mentioned before, there are more differences within these regions than there are between them, such as regional realizations of /q/. Mutual intelligibility, however, is a tricky criterion, because if we went by that we'd have to merge all dialects of Arabic (Levantine, Gulf, Egyptian) except for Maghrebi and many small isolated ones. --WikiTiki89 01:45, 21 February 2014 (UTC)
Heck, if we went by that, we'd have to merge Bokmål and Nynorsk back into Norwegian, and Dutch Low German and German Low German back into Low German, and who knows what other mergers that would probably be a good idea. —Aɴɢʀ (talk) 09:33, 21 February 2014 (UTC)
Yeah it turns out that for a dictionary spelling is more important than intelligibility, which is one reason that I think Mandarin and Cantonese should be re-merged. But of course it makes it difficult for languages with no standard such most Arabic dialects. --WikiTiki89 18:33, 21 February 2014 (UTC)
I agree with the merge. The difference between Syrian and Lebanese dialects is more political than linguistical. In fact, by our criteria, a lot of dialectal Arabic words and forms they should go under "Arabic", without any dialect distinction, perhaps a qualifier "colloquial", "regional", "East Arabic", "Gulf", etc. Some words that we mark as Egyptian Arabic are often used in many East Arabic dialects or are just too colloquial to be considered classical or standard, especially many borrowings, also there are some relaxed spelling forms, more typical for Egypt and Sudan but absolutely not exclusive to these regions. I also disagree with inclusion of Egyptian or other Arabic entries/translations, which are only pronounced differently from the standard. Considering how few written dialectal contributions we have, we could merge a few dialects, excluding Maghrebi. Note that dictionaries, such as Hans Wehr include dialectal words, just marking them by a region/regions. I am away in Dubai, London and Brunei, only contributing a little at the moment. --Anatoli (обсудить/вклад) 19:10, 21 February 2014 (UTC)
I was also thinking that we could merge all of Arabic together, but that would be a much larger decision that would need a vote and would require figuring out how to handle major differences in grammar, such as the fact that most dialects conjugate verbs differently from MSA. Merging smaller closely related dialect groups together is a much easier task and will make it easier to add entries for them. The reason I have refrained from adding Levantine entries is because I can't be bothered to add everything three times. I would suggest that we also merge all Maghrebi dialects into Maghrebi and all Gulf dialects into Gulf, but I personally don't know as much about these dialects to feel comfortable proposing this myself. --WikiTiki89 20:05, 21 February 2014 (UTC)
Yes, more merges may require a vote, same as with merging Sinitic varieties. Grammatical differences in Arabic dialects may be marked with qualifiers as well but they can be grouped, so that we maintain less dialects, which are almost identical grammatically and lexically. Perhaps such distinction should become important ONLY if we are really going to have e.g. separate conjugation tables for Egyptian, Levantine, etc. verbs (or other inflections), e.g. so that users can find that "bitidrisu"/"byudursu" ("they write") are different in Egyptian and Levantine Arabic. --Anatoli (обсудить/вклад) 21:47, 21 February 2014 (UTC)
Another consideration. Arabic dialects are mainly spoken, not written with some exceptions. Are forms like ايمتى رحيرجع؟ (Levantine) or حيرجع امتى؟ (Egyptian) "When will he return?" attestable? Standard Arabic inflections can be confirmed by citations. Is it true for dialects? --Anatoli (обсудить/вклад) 22:09, 21 February 2014 (UTC)
Sources describing dialectal grammar use romanisation for obvious reasons. Dialects use vocalisations extremely seldom. Such inflection tables would not be very useful if they used Arabic diacritics, as e.g. there are no special symbols for "e" and "o" and Egyptians read "q" as a glottal stop and jiim as /g/, so these examples would be written as something like "eemtaa raH-yirja3" / "Ha-yirga3 imta", obviously not attestable forms. Hopefully it makes sense. --Anatoli (обсудить/вклад) 22:27, 21 February 2014 (UTC)
For Egyptian, they are probably attestable. For Levantine it might be harder, but hopefully still doable for the most common words. Also, in Levantine the future marker رح is usually written as a separate word (thus: ايمتى رح يرجع؟, ēmtā raḥ yirjaʿ). The difficulty of vowel marking will have to be resolved with transliterations, that is our only choice as far as I can see (IPA would be too specific when we don't want it to be). --WikiTiki89 23:07, 21 February 2014 (UTC)
Re the comment about Nynorsk, etc: I don't want to stray too far off topic, but I do think we should merge Nynorsk and Bokmål. And iff we merged Nynorsk and Bokmål, I would not oppose merging Dutch Low Saxon and German Low German, though I wouldn't support it, either. Dutch Low Saxon and German Low German already represent mergers of a large number of lects that SIL / Ethnologue / ISO had deemed distinct enough to grant codes to: Dutch Low Saxon is Achterhoeks (act), Drents (drt), Gronings (gos), Sallands (sdz), Stellingwerfs (stl), Twents (twd), Veluws (vel); German Low German is Westphalian (wep); frs could mean either East Frisian the Low German lect or East Frisian the Frisian lect (SIL refuses to clarify which one it was intended to mean, and I have strong but as of yet unconfirmed suspicions that it's used both ways around here); and Plautdietsch (pdt) ... well, exists. - -sche (discuss) 23:33, 21 February 2014 (UTC)
Before we can do this (assuming there will be at least a couple more people who weigh in and agree), we will need to decide on a language code. Should we use "apc", "ajp", or a new one altogether (such as "sem-lev")? Since I don't understand what "apc" and "ajp" actually stand for, I can't say whether it would be appropriate to use one of them for the entirety of Levantine (the only sort of guess I have is that "ajp" could be "Arabic Jordanian Palestinian", but I can't even make a guess for "apc"). --WikiTiki89 20:05, 21 February 2014 (UTC)
More than likely, apc stands for "Arabic" + "all the descriptive combinations are taken by other languages so we'll use pc". Chuck Entz (talk) 20:47, 21 February 2014 (UTC)
"Arabic, Phoenician Coast"? "Aarabic, Lepanese and Cyrian"? —Aɴɢʀ (talk) 22:15, 21 February 2014 (UTC)
A Perfect Circle? Wait, wrong APC...
It's probably what Chuck says. In any case, even if apc and ajp do stand for something, we can still use either one to represent point, if we want to... for "Antillean Creole", we combined the ISO's gcf ("Guadeloupean Creole French") and acf ("Saint Lucian Creole French", but presumably abbreviating "Antillean Creole French") into gcf (even though, if you think about what the codes stand for as abbreviations, it might have made more sense to merge them into acf). - -sche (discuss) 22:30, 21 February 2014 (UTC)
I think the Arabic dialects that are mutually intelligible with standard Arabic should be handled the way the Romani lects are currently handled, viz. "Only the macrolanguage is allowed an L2 header [==Romani==], but the subdivisions are allowed nested lines in translations tables." In fact, if you read the discussion about Romani, you'll note that I was under the impression that Arabic was already handled that way! If we adopt such an approach, then merging North and South Levantine seems unnecessary, especially if they have such differences as Wikitiki describes ("the Lebanese dialect uses a lot of French borrowings and the Israeli dialects use a lot of Hebrew borrowings"). That's because I think it's easier (especially with Conrad's trans-adder) to add, and also to look at as a reader, separate lines for apc and ajp, vs one line with several entries each with its own qualifier. Whereas, if we want to grant all code-having Arabic dialects their own L2s, then it would make more sense to merge apc and ajp, since I think it's easier to add {{context}} tags to sense-lines than it is to create two separate L2 sections with almost identical content. So, I think it would be best if we developed an idea of what our general policy on Arabic dialects should be before acting here... (No matter what our general policy is, I think it makes sense to abolish sem-syr, converting each of its entries into whichever of apc vs ajp it happens to be.) - -sche (discuss) 22:54, 21 February 2014 (UTC)
What you may be overlooking is that French borrowings in Lebanese do not carry over to Syrian, and Hebrew borrowings in Israel do not generally carry over to the Palestinian territories, let alone to Jordan. Also, North Levantine is spoken in the North of Israel and uses the same Hebrew borrowings that are used in the South of Israel in South Levantine. Merging all dialects with MSA will be a big process, mostly due to our large number of Egyptian Arabic entries. Merging Levantine together is quick and easy and not much is lost in doing so even if we later decide to merge it with MSA. --WikiTiki89 23:07, 21 February 2014 (UTC)
I think it's possible and advisable to merge ALL Arabic varieties into Arabic, using e. g. {{context|Egypt|Sudan|lang=ar}} labels and regional categories, such as Category:Lebanese Arabic. Even infrequent inflection tables could have regional context labels. As I mentioned, Arabic dictionaries, which include dialectal forms just label them with Morocco, Syria, etc. (my preference and we have a precedence in Hans Wehr) or Levantine, Gulf. Apart from named dialects there are differences related to specific countries. I'd prefer labels "Lebanon", "Syria" to "North Levantine" or "South Levantine", even though there are differences within countries across dialects - we can combine labels such as {{context|Hejazi|Saudi|lang=ar}}. The majority of words are identical for all dialects, despite differences in pronunciation, there's no need to duplicate information. Words that are different in dialects are quite common but their percentage is small and many words are shared by various dialects, they are often considered more colloquial, not dialectal, especially numerous borrowings, frowned at by Arabic purists but still used quite often. In any case, keeping separate dialect dictionaries is impractical, any dialect also includes the majority of Arabic words. Besides, we have kept Albanian as one language without splitting into Tosk and Ghek, English is not split into American, British, etc. merged Serbo-Croatian varieties and I think we can also merge Norwegian and Chinese. (Merging Arabic varieties is a much simpler task than merging Chinese varieties and makes even more sense). --Anatoli (обсудить/вклад) 07:08, 27 February 2014 (UTC)
The differences between Arabic dialects are similar to the differences between German dialects such as Swiss German. Swiss German is considered a dialect but it is different enough that it is not mutually intelligible with Standard German. The difference between Arabic and German, however, is that in Arabic, the spelling of individual words is largely identical across dialects, which makes it practical to have them merged. But that does not the only factor to consider. If we merge all of Arabic together, when providing usage examples, we would have to label each usage to specify which dialect it uses. The other solution would be to only use MSA for usage examples, but this would further de-emphasize dialects and make it difficult for people trying to learn them. There are already very few resources online (and even offline) for Arabic dialects (with the possible exception of Egyptian). I think that it would be beneficial to treat the dialects separately from MSA, even if we don't plan on duplicating all of the content. I only plan to add the most distinctive Levantine words as entries, I don't plan on duplicating all of the Arabic content we already have. Finally, as I've already said before, the discussion of merging all of Arabic together is entirely separate from the discussion here. The discussion here is something we can do right now, without a lot of debate and without a vote. Merging all of Arabic together is a long term project. If you want to start a discussion about it, do it at the Beer Parlour, but it does not prevent us from merging Levantine in the meantime. --WikiTiki89 07:26, 27 February 2014 (UTC)
I know it's a diversion from your original topic but if the idea of merging dialects were met positively, then your request to merge two forms of Levantine would be solved as well. I don't think the problem with the lack of emphasis and resources for dialects will be solved any time soon, including Egyptian Arabic (which is slightly more available than others) and having them under one header is not a problem for adding regional user examples, if editors are aware that we only have "Arabic". I'll consider a new discussion in BP when I'm back home. I agree it would require a vote and a more thorough analysis. --Anatoli (обсудить/вклад) 07:45, 27 February 2014 (UTC)
I understand that if the idea of merging all of the dialects were met positively that would solve this problem as well, but that would take longer. And I don't expect that the lack of emphasis or resources would be solved any time soon, but I'd rather that Wiktionary be part of the solution and not part of the problem. As for usage examples, how would a reader know which dialect is being used? Should usage examples have context labels as well? --WikiTiki89 07:56, 27 February 2014 (UTC)
Not labels but a dialect name in brackets. I understand why most dialects are not emphasised by Arabs and why we shouldn't do it either or should promote written dialectal forms, only some of which are attestable. There's no definite right and wrong in spelling, pronunciation and transliteration and by definition, no standard. That's why Arabs mix standard with dialect spellings when writing in dialects. There is a so-called "spoken MSA" or "educated colloquial Arabic" (there are textbooks available, I have two books) where some common dialectal forms are mixed with MSA to produce a new variety. It obviously differs a bit regionally. --Anatoli (обсудить/вклад) 09:04, 27 February 2014 (UTC)
But spelling is not everything, there is also morphology, syntax, agreement, vocabulary choice and other aspects of grammar that are evident even when the spellings are the same as MSA. For example, بَدِّي أَشْرَب قَهْوَة and بَحِبّ بُيُوت صُغَار, in which every word is spelled the same as in MSA, yet the sentences still differ from the MSA أُرِيدُ أَنْ أَشْرَبَ قَهْوَةً and أُحِبُّ بُيُوتًا صَغِيرَةً. The question you still haven't answered is: What is wrong with merging Levantine together right now, even if we will later merge all Arabic dialects? --WikiTiki89 19:45, 27 February 2014 (UTC)
The known differences you listed are comparable with Serbo-Croatian and Albanian variations, some regional English slang. It would make sense to keep dialectal phrasebooks as separate subsections of the Arabic phrasebook but it's a word dictionary. Do you possess Hans Wehr dictionary? It has a lot of dialectal words/regionalisms but examples are only provided for MSA. As I said before, usexes can be used on the MSA entries and specific dialectal words can be labeled/categorised accordingly. There's nothing wrong with merging Levantine now, I have already expressed my support for this. --Anatoli (обсудить/вклад) 23:54, 27 February 2014 (UTC)
Ok, I see your point. No, I don't have a Hans Wehr, but I am thinking about getting one (since I am not a professional translator or anything remotely close to one, buying expensive dictionaries is not something I do for every language I happen to be interested in; in fact the only physical dictionary at all that I personally own is the Even-Shoshan Dictionary of Hebrew; however, due the scarcity of online Arabic dictionaries, and the inconvenience of a PDF dictionary, I am considering buying a Hans Wehr). And thanks for your answer. I'm looking forward to the official merging-all-of-Arabic debate. --WikiTiki89 02:17, 28 February 2014 (UTC)
(replying to Wikitiki's comment of 23:07, 21 February 2014): That's a good point. I suppose any linguistically sound treatment of Arabic dialects is going to divide them — whether into separate code-having L2s, or simply separate {{qualifier}}-tagged varieties — along lines too different from SIL's for it to be worth retaining ajp and apc no matter what we do. Alright, merge all three codes. (I note for posterity that ajp and apc are only used by a dozen entries each, anyway, and sem-syr is not used at all in the main namespace.)
Now, to address the question of whether the unified "Levantine" should use one of the existing codes or get an exceptional code... precedent, both here (see gcf) and at the ISO (see e.g. their merge of tlw into weo), seems to be to use one of the existing codes. - -sche (discuss) 03:03, 28 February 2014 (UTC)

Since Wikitiki89 requested my comment, I would say let's remove national-based categorization of dialects and let's separate them by the ISO 639-3 categorization of dialects. It categorizes two Levantine dialects, the northern and the southern. They are close, but separate enough. They are not one and they are not comparable to Boston-New York accents. If you listened to a speaker from Gaza, you can tell the difference in many words and somewhat in grammar from what would a speaker from Lebanon would.

A separate note is that Arabic dialects are hardly mutually intelligible, even when their spoken range is geographically close. They only become intelligible under certain circumstances: 2 people are literate in Literary Arabic which affects both dialects; the same 2 people have accustomed themselves on both dialects by listening to many songs and conversations in the other dialect; when the 2 people try to use a simplified straightforward version of their dialects. --Mahmudmasri (talk) 12:35, 3 March 2014 (UTC)

Subdialects of Levantine
@Mahmudmasri: The reason Gazan Arabic is so different from Lebanese is because, as you can see on the map at right, it is at the very bottom of the continuum. What the map calls "Palestinian" is much closer to Lebanese than Gazan (what the map calls "Outer southern") is. If anything, it should be the "Outer southern" that should be broken off as a separate language, but even that would be a longshot. Anyway, I agree that the only reason many of the dialects are mutually intelligible is because of knowledge of MSA or mutual knowledge of the other's dialect. However, all variants of Levantine Arabic are mutually intelligible with each other. --WikiTiki89 16:40, 3 March 2014 (UTC)
I'm not sure to what extent the map is precise, however the dialect spoken by urban west Jordanians is not close to the Lebanese as that of northern Israel and West Bank to the Lebanese, even though in both cases the variants are spoken in a very close region. I'm also not sure whether the Gaza speech would be very much mutually intelligible with the Aleppo speech as much as you assume. --Mahmudmasri (talk) 02:10, 4 March 2014 (UTC)
@Mahmudmasri: You have valid points about differences in dialects. It doesn't mean all Arabic dialects can't have the same L2 header. Lack of templates makes harder to add dialectal contexts, e.g. شو (šū) may get an entry:


# {{context|interrogative|Palestine|lang=ar}} [[what]]?
...and be categorised under Category:Levantine Arabic. Arabic dictionaries, which include dialectal words, have a way of doing it I don't see why we cannot do it as well. Please look at Chinese discussion: Wiktionary:Beer_parlour/2014/March#A_new_format_for_Chinese_entries_.28multisyllables.29, it may give you some ideas. --Anatoli (обсудить/вклад) 03:47, 14 March 2014 (UTC)
No, please, don't just sum up all dialects under ==Arabic==. You'd better have a section ==[North/South] Levantine Arabic==. The /ʃuː/ is North Levantine, not south, but the Palestinian dialects are not one. The Gaza speech is also considered Palestinian, but they say /ʔeːʃ/. --Mahmudmasri (talk) 18:08, 16 March 2014 (UTC)
To clarify, and I know this from asking people from various parts of the Levant, North Levantine uses only /ʃuː/, while South Levantine uses both /ʃuː/ and /(ʔ)eːʃ/ interchangeably. --WikiTiki89 18:44, 16 March 2014 (UTC)
Obviously, because there are no clear boundaries where the dialect area ends. However, I never heard someone from Gaza normally saying /ʃuː/, unless he consciously tries to use the word. --Mahmudmasri (talk) 13:43, 17 March 2014 (UTC)
Maybe movies are not the best source, but I've watched a Gazan movie a long time ago (before I knew much about Arabic) and I remember hearing /ʃuː/. And just recently I watched a Palestinian movie where the same characters used both /ʃuː/ and /(ʔ)eːʃ/ interchangeably. Additionally, I have asked around Palestinians and Jordanians seem to say that they use whichever one sounds better in the sentence. --WikiTiki89 16:11, 17 March 2014 (UTC)
I've deleted so-called "Syrian Arabic", judging there to be support above for that. (I repeat my comment that there are 3+ varieties of Arabic spoken in Syria.) Merging the other dialects seems to not have the support of the one native speaker of Arabic who has commented. - -sche (discuss) 02:14, 26 January 2015 (UTC)