Module talk:ar-headword

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Much discussion is found elsewhere[edit]

See Module talk:ar-nominals, Wiktionary talk:About Arabic. Benwing (talk) 08:14, 30 December 2014 (UTC)[reply]

Note, some of the discussion originally found below was moved to Module talk:ar-nominals on Dec 30, 2014. Benwing (talk) 08:14, 30 December 2014 (UTC)[reply]

Plural forms, dual forms, etc.[edit]

We don't normally split words into different categories depending on which form they are. This is because languages may have dozens of forms and having a category for each one is pointless. So I think these should just be "noun forms" and "adjective forms". —CodeCat 20:02, 22 November 2014 (UTC)[reply]

Then what is the point of the category "plurals"? Particularly, it claims to be for nouns only, which makes no sense -- it should be for adjectives as well. Benwing (talk) 21:47, 23 November 2014 (UTC)[reply]
I don't think we should have the "plurals" category at all, for any language. But inertia among other editors is hard to go against. —CodeCat 21:55, 23 November 2014 (UTC)[reply]
I think we should have plural and dual categories. The common agreement is to have plurals, anyway, AFAIK. --Anatoli T. (обсудить/вклад) 04:59, 24 November 2014 (UTC)[reply]
We have POS categories for 'noun dual forms' and 'noun plural forms' and 'adjective dual forms' and 'adjective plural forms'. I think it's reasonable to use these. That's why I created templates and entries in Module:ar-headword. How do I, as a user, otherwise find the category containing noun plural forms, separately from just "noun forms"? @CodeCat Please leave these alone. I am about to create entries for Arabic broken plurals and I would like to tag them properly, not as just "noun forms" or "adjective forms", so they can be placed in appropriate categories. I am planning on replacing {{ar-plural}} (which is now being used pretty much only for noun plurals) with {{ar-noun-pl}}, so that this template can insert the words into Category:Arabic noun plural forms and Category:Arabic noun forms. I do not have a lot of time these days for Wiktionary and I would rather spend the time improving it than fighting over technicalities. Benwing (talk) 07:07, 24 November 2014 (UTC)[reply]
@CodeCat BTW, please undelete your deletion of the doc page for {{ar-plural}}. Your changes appear contrary to consensus -- at the very least, Anatoli indicates there is general consensus for plural categories. With your changes, there would be no category containing just Arabic plurals, which seems a highly undesirable result. I'm not going to create entries for broken plurals if they aren't tagged as Arabic noun plurals; if they're tagged just as noun forms, it's very difficult to later fix them to be tagged properly. Benwing (talk) 07:14, 24 November 2014 (UTC)[reply]
What is the point of categorising noun forms by the type of form, exactly? —CodeCat 13:27, 24 November 2014 (UTC)[reply]
Also, the consensus is not for plural categories in general. It's for plural categories in cases where nouns are not otherwise inflected. You'll notice that most of the languages in Category:Plurals by language are this way, whereas the remainder appears in Category:Noun forms by language. Furthermore, many languages have just copied the English model, even though it's inappropriate. For example, Swedish has a rather full plurals category, but Swedish nouns are inflected for more than just the plural. You may also notice that several of the languages listed there have both a "plurals" and a "noun forms" category. Finally, as the notice at the top says, the term "plural" is ambiguous and can lead to miscategorisation, as it would suggest that anything that is plural should go in there, including adjective and verb plural forms. All in all, I oppose creating more "plurals" categories, and I also oppose subdividing non-lemma forms when there is no clearly defined rationale in doing so ("other languages do it" is not a rationale). —CodeCat 13:58, 24 November 2014 (UTC)[reply]
What's the point? What's the point for Category:French_plurals then or any other language? It's the same with Arabic. Although Arabic nouns have inflections, there are normally unwritten, the case endings in diacritics are not part of the entry. --Anatoli T. (обсудить/вклад) 23:23, 24 November 2014 (UTC)[reply]
Category:French plurals is really synonymous with Category:French noun forms, and I think it ought to be renamed. But for Arabic, there would be both Category:Arabic plurals and Category:Arabic duals or, alternatively, Category:Arabic noun plural forms and Category:Arabic noun dual forms. What I wonder is what benefit there is in splitting noun forms between those two categories, rather than putting them all in Category:Arabic noun forms. I see no benefit in splitting noun forms as I doubt anyone would specifically want to look up a word among a (potentially huge) list of dual noun forms. I feel this way not just for Arabic but for any language and any part of speech.
I want to avoid the disaster we had with User:Sae1962 who created a category (or would have, if they had gotten the chance) for every single possible noun or verb form that exists in Turkish. Now of course two categories is not as bad as dozens or hundreds of them, but one could always argue that if two categories are useful, then dozens or hundreds ought to be no less useful by extension. And since I don't think having lots of categories is useful, I am simply being consistent by questioning the use of two of them along the same line. —CodeCat 23:32, 24 November 2014 (UTC)[reply]
I have no preference over Category:Arabic noun plural forms or Category:Arabic plurals as long as there is a category for both Arabic plurals and duals but you're right, the former is better. Arabic plurals are mostly irregular, especially for non-humans, I find them very useful, even if it's not always easy to determine gender and plurality of nouns. Whether they are useful, it's not for one person to decide, IMO. I remember people complaining when they couldn't auto-generate plurals. One or the other way but dual and plural categories should exist. There won't be many other inflected forms for Arabic nouns, perhaps just accusative forms for triptotes only. --Anatoli T. (обсудить/вклад) 23:44, 24 November 2014 (UTC)[reply]
Arabic inflection paradigm can be seen at كِتَاب (kitāb) (nominative: كِتَابٌ (kitābun) where dual (nominative) form is كِتَابَانِ (kitābāni), plural (nominative) كُتُب (kutub). The majority of inflected forms have the same basic forms as singular, dual and plural (only one potential entry per each), they only differ by ʾiʿrāb or case endings with diacritics. Definite forms have a definite article. Only some irregular nouns may require definite forms, which differs from indefinite. Accusative singular indefinite is كِتَابًا (kitāban) and oblique dual is كِتَابَيْ (kitābay). --Anatoli T. (обсудить/вклад) 00:05, 25 November 2014 (UTC)[reply]
Let me rephrase then. What do you intend to use these categories for, personally, that you could not use a combined Category:Arabic noun forms for? Also, I'm not trying to be obstructive, I just don't want to be forced to cooperate when I have no compelling reason to. That doesn't mean I'm not open to such a reason, I'm actually trying rather hard to find one but so far without success. I still feel that having separate categories for inflected forms would result in categories that nobody would ever use, and worry that they would set a precedent for creating ever more, and more finely-split, categories for whatever combination of grammatical categories people can think of. I'm not opposed to categorising, I'm opposed to categorising for the sake of it. —CodeCat 00:11, 25 November 2014 (UTC)[reply]

/вклад) 00:05, 25 November 2014 (UTC)[reply]

Arabic plural form category is very useful for learning broken plural patterns. Mixing all inflected forms into one category would interfere with that. --Anatoli T. (обсудить/вклад) 00:45, 25 November 2014 (UTC)[reply]
I wonder if it wouldn't make more sense to categorise the plurals by type, then. Or categorise the nouns by the type of plural. —CodeCat 01:19, 25 November 2014 (UTC)[reply]
This would require some developer efforts. I don't have the skills for that. I think it's possible (it would possibly require additional parameters or a module should read the pattern off the form) but even if they are categorised by patterns, there need to be a supercategory. The patterns based on root consonants are used in Semitic languages for various things, not just for forming plurals but forming broken plurals (as opposed to "sound plurals") is a special area of interest (necessity) to all learners of Arabic (Hebrew, etc.). --Anatoli T. (обсудить/вклад) 01:29, 25 November 2014 (UTC)[reply]
What about a category in Category:Nouns by inflection type by language as parent? —CodeCat 01:46, 25 November 2014 (UTC)[reply]
They are not inflections, Arabic noun inflections are - triptotes, diptotes, indeclinables, irregular(?). Semitic languages could use Category:Nouns by broken plural type by language (the major changes are mainly inside a word, the ending being a part of a pattern) but AFAIK, there's not much work going on for other languages except for Arabic in this respect. --Anatoli T. (обсудить/вклад) 01:54, 25 November 2014 (UTC)[reply]
I'm sorry, I'm not sure if I understand. I'm somewhat familiar with what broken plurals are; they're plurals formed by changing the vowels in the root, correct? Wikipedia calls that "introflection". But it also states on w:Inflection that: An inflection expresses one or more grammatical categories with a prefix, suffix or infix, or another internal modification such as a vowel change. Many Indo-European languages create different forms by doing things like that too. The strong verbs in Germanic are a very clear example, as are umlauted plurals. Our current practice is to treat these under "inflection" anyway, so we might as well do the same for Arabic. —CodeCat 02:07, 25 November 2014 (UTC)[reply]

That's fine then but we still need plural/dual categories. I like these categories (whatever they are called) and not just for Arabic. --Anatoli T. (обсудить/вклад) 02:12, 25 November 2014 (UTC)[reply]

I still wonder why you like them, though. What use would you get out of them, assuming that we modify the necessary modules so that entries are categorised based on the type of plural and the vowel pattern if it's broken? —CodeCat 02:16, 25 November 2014 (UTC)[reply]
Can you describe your plan in some detail? What will Category:Arabic nouns by inflection type contain according to your idea? --Anatoli T. (обсудить/вклад) 02:29, 25 November 2014 (UTC)[reply]
We need to separate out plurals so that people who want to study how Arabic plurals work can do so. I think Anatoli also gave this reason up above. Arabic broken plurals appear highly irregular at first glance (e.g. singular ṣadīq "friend" has pl. normally ʾaṣdiqāʾ but also it can beṣudaqāʾ or ṣudqān, wazīr "minister" with the same vowel pattern has pl. only wuzarāʾ, ṭarīq "road" with the same pattern has pl. either ṭuruq or ṭuruqāt, etc.), but they follow some definite patterns, which you can start figuring out if you can look at a bunch of examples -- generally there are relations between the vowel pattern of the singular and that of the plural. Randomly mixing plurals and feminines and duals and such into generic "noun forms" and "adjective forms" categories is far from helpful for doing this. Further categorizing plurals by "sound" (i.e. formed with an ending) and "broken" (formed by ablaut) would probably be a good thing. I'm not opposed to an even finer subdivision by vowel pattern but I don't think it's necessarily required (and there are probably a few dozen patterns so that would be a lot of categories), and we'd definitely want a broader category holding plurals in general (or noun plurals separate from adjective plurals, either way is fine).
Another reason for {{ar-plural}} (or its replacement {{ar-noun-pl}}) is that it automatically adds the argument g=p to the headword. The purpose of its intended replacement {{ar-noun-pl}} is to categorize both as a plural and as a noun form, or as a "noun plural form".
CodeCat, in response to your concern about an explosion of Turkish categories, I understand your concern but in this case there simply aren't hundreds of such categories for Arabic, and the POS categories such as "noun plural form" already exist so I don't see why you're objecting to them. My ideal would actually be to have categories something like the following: Arabic plural forms, Arabic dual forms, Arabic feminine forms, Arabic oblique forms, Arabic noun forms, Arabic adjective forms (maybe also Arabic accusative forms and Arabic genitive forms, since some nouns and adjectives have separate genitive and accusative forms and others have a combined genitive-accusative aka "oblique"), and cross-categorize as necessary, e.g. a feminine adjective ends up in both the feminine and adjective categories. Even if we split some of these categories by noun and adjective forms (e.g. Arabic adjective plural forms) there would be at most about 10 such categories (e.g. most nouns don't inflect for gender), and the only ones that are really important are the plural and feminine forms since these are the ones that are the most irregular. Benwing (talk) 05:27, 25 November 2014 (UTC)[reply]
@CodeCat You haven't responded to this (at least not directly). Would you mind if I proceed with renaming {{ar-plural}} to {{ar-noun-pl}}/{{ar-adj-pl}} (I have a manual list of which forms need to end up as adjective plurals) and creating new entries for plurals, making use of {{ar-noun-pl}} and {{ar-adj-pl}} in the headwords? If the consensus later on is to make these just "noun forms" and "adjective forms", this can be done easily from this state, even by just modifying the templates, but going the other way (converting just "noun forms" and "adjective forms" to plurals, duals, feminines, etc. is not very easy). As I mentioned above, these various plural templates all have the benefit of auto-inserting g=p into the call to {{head}}, which otherwise would have to be done manually. Benwing (talk) 07:32, 26 November 2014 (UTC)[reply]
I would not have a problem with that. —CodeCat 15:15, 26 November 2014 (UTC)[reply]

Categorising nouns by plural formation[edit]

We currently categorise nouns in many languages according to the inflectional pattern they follow. A good example is in the Germanic languages, where each pattern of vowels in the strong verbs has a number from 1 to 7. I don't know if Arabic nouns have their patterns numbered similarly, but in any case I imagine that it's possible to categorise nouns based on what pattern is found in the plural. For example, wazīr ~ wuzarāʾ would be the pattern -a-ī- ~ -u-a-āʾ (possibly written some other way if it's clearer). My intention was to categorise nouns this way. This module could look at the singular and plural forms and compare them, and extract the vowel pattern. This would then be used to put the noun in the appropriate category. Another possibility could be to use the inflection table for this purpose, but this only works if every noun is supposed to have an inflection table eventually. In any case, I think categorising nouns this way is a lot more useful than just putting all the plurals in one category and letting users figure it out. After all, there is no way to see the vowel pattern just by looking at some plural forms listed in a category, so I really don't know how a plurals category would be helpful for that at all. —CodeCat 18:22, 25 November 2014 (UTC)[reply]

I'd like to have broader discussions on usefulness of the mentioned categories. As for patterns and pattern categories, I think it could be based on transliterations with substitutions of root consonants with C or C1, C2, C3, (C4), e.g. CuCaCāʾ, CuCuC, ʾaCCiCāʾ, etc. --Anatoli T. (обсудить/вклад) 21:30, 25 November 2014 (UTC)[reply]
The problem is I don't know enough about Arabic spelling to do this right. I do at least have a basic approach so maybe Benwing can help. My idea was to have a function that, given any word, returns one form of the word with all consonants replaced by some placeholder, and another with all the vowels replaced by some placeholder. This would make it easier to extract the consonantal root and the vowel pattern. —CodeCat 21:49, 25 November 2014 (UTC)[reply]
One word where my limited knowledge of Arabic is a problem is in إِرْهَابِي (ʔirhābī). I'm assuming that this is a sound plural, but there is an alternation of final -ī with medial -iyy-. My guess is that this is a regular change, but again I don't know enough to be sure. —CodeCat 21:54, 25 November 2014 (UTC)[reply]
It's an adjectival noun, behaves like an adjective. ī and -iyy are used interchangeably. If ʾiʿrāb is omitted, e.g. ʾirhābī (previously transliterated as ʾirhābiyy), then ī is used, with ʾiʿrāb or any other vowel ending, it's -iyy-. ʾirhābiyyun is the form with nominative indefinite ending (ʾirhābiyyan - accusative, ʾirhābiyyin - genitive and there are also definite endings - -u/-a/-i), ʾirhābī is the pausal form. We haven't come to an agreement, if the headword should contain ʾiʿrāb (e.g. -un ending). It's a bit of inconsistency in this entry. Sound plurals don't need plural pattern categories, IMO. --Anatoli T. (обсудить/вклад) 23:18, 25 November 2014 (UTC)[reply]
But detecting when something is a sound plural still means knowing about such irregularities. We may not need to subcategorise them, but we do still want Category:Arabic nouns with sound plurals right? —CodeCat 23:23, 25 November 2014 (UTC)[reply]
Yes, I meant they don't need a pattern like nouns with broken plurals, just Category:Arabic nouns with sound plurals would be fine. --Anatoli T. (обсудить/вклад) 23:49, 25 November 2014 (UTC)[reply]
Rather than creating categories right away, the module uses tracking categories for now. The current method is rather simple: if the plural begins with the singular, then it's considered a sound plural. See the "what links here" link for Template:tracking/ar-head/sound. I'm surprised there aren't more of them. —CodeCat 00:10, 26 November 2014 (UTC)[reply]
Not many, perhaps ـٌ (-un) (ḍamma tanwīn) should be chopped off to get more? --Anatoli T. (обсудить/вклад) 00:19, 26 November 2014 (UTC)[reply]
(Linking to diacritic entries is problematic, pls. search for "ḍamma tanwīn" or "ḍammatān"). --Anatoli T. (обсудить/вклад) 00:22, 26 November 2014 (UTC)[reply]
I've been looking at w:Arabic nouns and adjectives so that I can have at least a basic grasp of what's going on. According to that article, nouns are normally lemmatised with no endings. So wouldn't that mean that no entries should end with -un? —CodeCat 00:25, 26 November 2014 (UTC)[reply]
There are pro- and contra- arguments for inclusion of ʾiʿrāb (-un (ḍamma tanwīn) for triptotes or -u (ḍamma) for diptotes). We can discuss this separately (here's one discussion: User_talk:Benwing#ʾiʿrāb but for the sake of searching sound plurals, I think final diacritics could be chopped and ignored in the endings (they may be different). --Anatoli T. (обсудить/вклад) 00:32, 26 November 2014 (UTC)[reply]
You were probably using the basic form, anyway, since إِرْهَابِي (ʔirhābī) was on your list. Also, replacing ة with ات for plurals is also considered "sound", when forming plurals - مُدَرِّسَة (mudarrisa) -> مُدَرِّسَات (mudarrisāt). --Anatoli T. (обсудить/вклад) 00:44, 26 November 2014 (UTC)[reply]
It took me a while to realise that ʾiʿrāb means the -un ending. I really don't know much about Arabic grammatical terms, so please try to explain them before you use them so that I can follow. In any case, from what I can gather, the form without endings is used as the last word of a sentence, but does every noun have such a form? If not, then we should probably not use it as the lemma form.
There aren't really any good noun inflection tables for Arabic, that I can see. I could try to make a module, but it will need a lot of checking to make sure it's correct. I do believe that inflection tables would be preferable to cramming it all into the headword line. We could display all the forms then (nice for learners) and it would also be a lot easier to figure out the stem of the word because we could just make that the parameter. Then changes like the ة > ات would be a lot easier to deal with. —CodeCat 00:51, 26 November 2014 (UTC)[reply]
Sorry for confusing you, I thought I used "or" to make it clearer. Noun entry titles shouldn't include diacritics but some headwords include full diacritics (with -un/or -u). I mentioned كِتَاب (kitāb), which has full declension in the entry but that's a "triptote" (it has three different endings for each case and definite are different from indefinite endings), سُلْطَانُ (sulṭānu) is a "diptote" - it has only two endings - nominative and oblique and indef=def. Arabic declension templates also need attention and could use a module for that, they are relatively simple, unlike verbs (which are a nightmare). We can help you make declension tables, they are not too hard. --Anatoli T. (обсудить/вклад) 01:01, 26 November 2014 (UTC)[reply]
Does the form with no endings actually exist as a separate word (ignoring whether it would be spelled the same if vowels are removed)? Should it be included in inflection tables? —CodeCat 01:07, 26 November 2014 (UTC)[reply]
Do you mean كِتَاب (kitāb) and سُلْطَان (sulṭān)? They are not included in the declension. These forms are considered "pausal" forms but "sukūn" diacritic (lack of any vowel) is not written on them. Endings are dropped at the end of a phrase or before a "pausa" in speech. It's also a relaxed pronunciation of Arabic - Arabs seldom or never use them in conversations. Some linguists advocate using a simpler approach to eliminate the endings in teaching Arabic. It matches all modern Arabic dialects, which ignore cases endings (ʾiʿrāb) altogether, except for some cases when forming adverbials. ʾiʿrāb is part of Classical or Qur'ānic Arabic but MSA is not using them consistently. Knowing ʾiʿrāb is still useful, as you can just drop endings, if you don't want to use them but in standard Arabic, endings help understand relationship of words in a sentence and the word order is relaxed, unlike dialects, where it becomes important. --Anatoli T. (обсудить/вклад) 01:27, 26 November 2014 (UTC)[reply]
What if we included two inflection tables, one standard/classical and one modern/simplified? —CodeCat 01:39, 26 November 2014 (UTC)[reply]
There's no simplified inflection for nouns, it's just the word without ʾiʿrāb. Connecting vowels are sometimes inserted for smoothness, e.g. "ḥizbo llāh"/"ḥizba llāh"/"ḥizbu llāh" in colloquial Arabic and "ḥizbu llāh(i)" (nominative) in MSA. Dialects preserve accusative forms to make adverbials from accusative forms, which retain final alif (ا) in writing and set expressions. مَرْحَبًا (marḥaban) (since it's colloquial -an is often reduced to -a, especially in this word).--Anatoli T. (обсудить/вклад) 01:48, 26 November 2014 (UTC)[reply]
@CodeCat Do you have a list of the tracking categories you've created? I'd like to take a look at them. Your approach of just looking at plurals that are the same as singulars plus endings isn't quite right because examples like ḥammār pl. ḥammāra is normally considered a broken plural. Sound plurals are often defined as those plurals having the plural endings -ūn/-īn (for masculine people) and -āt (for feminine people and for objects). The tricky thing is that sometimes, broken plurals have a sound plural ending added to them, e.g. the example I gave above with ṭarīq with one of its plurals ṭuruqāt, composed of broken plural ṭuruq (a recognized broken plural pattern) + ending -āt. I'd still consider that broken because of its irregularities, so maybe a proper definition of sound plural is plurals that are formed exactly by adding the sound endings to the singular, removing any final singular -a (usually feminine) in the process, without any further vowel modifications.
As for good Arabic declension tables, the one I created in w:Arabic nouns and adjectives is supposed to be comprehensive, although it's true that it shows things schematically rather than simply listing all the paradigms. The tricky thing here is that the plural paradigms are somewhat independent of the singular ones, e.g. broken plurals are in origin collective singulars and are inflected pretty much like singulars, and a given noun might have both a sound and a broken plural, as well as more than one type of noun having the same sound plural formation (e.g. both masculine non-human triptotes and feminines (human or not) in -a can have a sound plural in -āt, but the singulars are two different declensions).
As for ʾirhābī vs. ʾirhābiyy, the -iyy is not a plural but rather an adjective-forming suffix, which can in turn be nominalized. This is one of the only two true suffixes in Arabic, in that it can be added to pretty much any noun, including broken plurals, to form an adjective meaning "related to X" where X is the noun. This is a so-called nisba adjective. The difference between and -iyy is that is the informal pronunciation; we've chosen in general to write it both in Arabic and transcription as -iyy. ʾirhābī was an exception and a mistake, and I fixed it.
As for declension tables, I think this is a good idea but I still think the headword should display at least the irregular/unpredictable forms. E.g. plurals esp. of masculine words are quite unpredictable and should be included in the headword line, as should irregular feminines (e.g. kaslān "lazy" feminine kaslā instead of regular *kaslāna). This is similar to what we do for verbs, where we show the non-past stem along with the lemma (the past stem).
As for how to show broken plural patterns, it's not enough to just extract all consonants because some of them are part of the pattern (e.g. in wuzarāʾ with pattern CuCaCāʾ where the glottal stop is a consonant but in this case is part of the pattern). It's also not enough to just use C to represent replaceable consonants because sometimes a consonant will appear double in one form but not another, e.g. kātib "writer" has pl. kuttāb where the t appears double, and this pattern is a common one for nouns of occupation of the form CāCiC. The traditional Arabic solution is to use the Arabic letters ف ع ل which correspond to F-ʿ-L but there are various problems, one of which is the inability to capitalize the pharyngeal sound ʿ, and another is the lack of any standard for four-consonant roots. A system I've followed in various places in Wikipedia, which comes from the Georgetown series introductory Iraqi Arabic grammar, is to use F-M-L (first, middle, last) for the consonants of a three-consonant root and F-S-T-L (first, second, third, last) for the consonants of a four-consonant root. Then e.g. we can say unambiguously that an occupational noun of the form FāMiL typically forms a plural FuMMāL, and that a four-consonant word with a short vowel between the third and fourth consonant (the T and L), e.g. maktab "desk, office" pattern FaSTaL, almost always forms a pl. FaSāTiL e.g. makātib, whereas if there's a long vowel between the T and L in the singular, the pattern FaSāTīL is used with a long vowel between the T and L in the plural, e.g. miftāḥ "key" pl. mafātīḥ, maktūb "something written, a letter(?)" pl. makātīb, tamrīn "exercise" pl. tamārīn. Note that the words maktab, maktūb, miftāḥ, tamrīn are all formed themselves from a three-consonant root using a pattern that includes an m- or t- as part of the pattern (e.g. maktab is a "noun of place" formed using the pattern maFMaL from the root k-t-b "write", i.e. literally "place of writing", but for the purposes of forming a broken plural the m- and t- get treated as root consonants rather than pattern consonants). Hope this isn't too confusing.
So basically I'd suggest starting with the transliteration with ʾiʿrāb stripped (there's an argument to the transliteration function to strip this automatically); then for the singular, extract all consonants and replace them with F-M-L or F-S-T-L, and then in the plural, match up the same consonants, preserving their order, looking for geminate consonants, and treating all remaining vowels and consonants as part of the pattern.
If you want the Arabic equivalent, map the transliteration back to Arabic, which is generally much easier to do than the other way around and there may even be code in Module:ar-translit already to do this. (There's also code in ar_translit.py in WingerBot; find on github.com/benwing/WingerBot.)

Benwing (talk) 07:25, 26 November 2014 (UTC)[reply]

For now I think I will make the inflection tables without doing any pattern extraction. Making the patterns work would be much easier if there are already many entries that use the module, which of course means that you (plural) will need some time to add the inflection tables to entries. I do have one question though: is the inflection the same for nouns and adjectives, in that the forms are created the same way and both have the same set of forms too? That is, do we need separate {{ar-decl-noun-triptote}} and {{ar-decl-adj-triptote}} or will a single {{ar-decl-triptote}} be enough? —CodeCat 15:24, 26 November 2014 (UTC)[reply]
Please correct me but I'm not aware of adjective - diptotes. Adjectives need masculine and feminine singular, dual (shared by both genders) masculine and feminine plural (can be sound or sound, also for the same adjective), definite forms for all three cases, both genders and numbers (sg, dual, pl.), masculine and feminine elative (comparative and superlative) as one cell. --Anatoli T. (обсудить/вклад) 21:37, 26 November 2014 (UTC)[reply]
@Benwing I think كَبِير (kabīr) probably could use another form in the headword - feminine broken plural/elative (?): كُبْرَى (kubrā). (it should probably be noted somewhere that non-human plurals are grammatically feminine singular), so كَبِيرَة (kabīra) (fem. for كَبِير (kabīr)) is also big for objects, animals in plural, so is the elative form كُبْرَى (kubrā). --Anatoli T. (обсудить/вклад) 21:45, 26 November 2014 (UTC)[reply]
I'll start an example triptote inflection (kabīr doesn't sound plural, only broken:
Indefinite singular
  1. Nominative: كَبِيرٌ m (kabīrun), كَبِيرَةٌ f (kabīratun)
  2. Genitive: كَبِيرٍ m (kabīrin), كَبِيرَةٍ f (kabīratin)
  3. Accusative: كَبِيرًا m (kabīran), كَبِيرَةً f (kabīratan)
Definite singular (in ʾiḍāfa definite article اَلـ (al--) is not used but they are still definite)
  1. Nominative: اَلكَبِيرُ m (al-kabīru), اَلكَبِيرَةُ f (al-kabīratu)
  2. Genitive: اَلكَبِيرِ m (al-kabīri), اَلكَبِيرَةِ f (al-kabīrati)
  3. Accusative: اَلكَبِيرَ m (al-kabīra), اَلكَبِيرَةَ f (al-kabīrata)
Indefinite plural (masculine, humans, some animals - broken)
  1. Nominative: كِبَارٌ m pl (kibārun)
  2. Genitive: كِبَارٍ m pl (kibārin)
  3. Accusative: كِبَارًا m pl (kibāran)
Definite plural (masculine, humans, some animals - broken)
  1. Nominative: اَلكِبَارُ m pl (al-kibāru)
  2. Genitive: اَلكِبَارِ m pl (al-kibāri)
  3. Accusative: اَلكِبَارًَ m pl (al-kibārana)
Dual (definite just need the article اَل) (duals are predictable, genitive and accusative are always the same, definite form don't differ from indefinite, apart from the article)
  1. Nominative: كَبِيرانِ du (kabīrāni)
  2. Oblique (genitive and accusative): كَبِيرَيْنِ du (kabīrayni)
Should elatives be treated separately? @Benwing, Wikitiki89, CodeCat --Anatoli T. (обсудить/вклад) 00:25, 27 November 2014 (UTC)[reply]
Generally, adjectives and nouns have the same formation except that adjectives are declined for masculine and feminine and nouns aren't. However, you need to be aware of the fact that it is best to treat masculine and feminine of adjectives, and singular and plural of nouns and adjectives, as separate declensions that may not always be paired. For example, most adjectives are like kabīr in having a masculine with triptote declension and a feminine with feminine triptote declension (a slightly different declension from the plain triptote declension of the masculine and marked with suffix -a; some masculine words have this declension too so "feminine triptote" is slightly misleading). But: (1) Lots of adjectives have broken masculine plurals, which are of various declensions (e.g. kabīr, with masculine plural kibār with triptote declension); (2) color/defect adjectives like ʾaḥmar "red" have diptote declension in both masculine (ʾaḥmar) and feminine (ḥamrāʾ) and a broken plural (ḥumr) with triptote declension; (3) intensive adjectives in -ān e.g. kaslān "lazy" are diptotes, and their feminine e.g. kaslā is in the invariable declension; etc. So you should design a basic template that allows the declension of masculine, feminine, masculine pl. and feminine pl. to be independently specified, and then design higher-level templates that group them according to the most common groupings. Benwing (talk) 08:36, 27 November 2014 (UTC)[reply]
Elatives should maybe be treated as their own lexical entries. Sometimes there are elatives without a corresponding base adjective, and plenty of adjectives don't form elatives. You should probably list the elative in the declension table separately from the part that gives all the normal inflections. Benwing (talk) 08:40, 27 November 2014 (UTC)[reply]
Thanks. @CodeCat Are you still interested in making the module and templates? It may be all confusing if there are no concrete full examples. I think it's better to start with nouns. Surprisingly, there is much less info on all possible inflections of adjectives but I have bits and pieces in three reference books, which can be used and Benwing and Wikitiki will help, I hope. --Anatoli T. (обсудить/вклад) 23:12, 27 November 2014 (UTC)[reply]
I do want to but things outside Wiktionary made my motivation drop a bit. It's probably one of those things where it's not so bad once I just get started on it. —CodeCat 23:15, 27 November 2014 (UTC)[reply]
@Atitarev, Benwing I created Module:ar-nouns and Template:ar-decl-noun-triptote, and replaced the older {{ar-decl-noun}} with it in the few entries that use it. I hope I did everything right, please make sure by checking the transclusions. —CodeCat 16:40, 28 November 2014 (UTC)[reply]
Thanks. Indefinite accusative (singular and plural) should be followed by an alif ا with the exception of words ending in ة or ء. --Anatoli T. (обсудить/вклад) 21:38, 28 November 2014 (UTC)[reply]
I'm not sure I understand. Is the accusative really -an + alif? —CodeCat 00:36, 29 November 2014 (UTC)[reply]
Yes - كِتابًا (kitāban) is the indefinite accusative of كِتاب (kitāb, book). Alif has no phonetic value here but is preserved in non-vocalised writing and serves as an indication of accusative, when the context is clear. Accusative examples without alif: حِذَاءً (ḥiḏāʔan, (pair of) shoes) (its nominative is حِذَاءٌ (ḥiḏāʔun)), plural أَحْذِيَةً (ʔaḥḏiyatan) (nominative: أَحْذِيَةٌ (ʔaḥḏiyatun)) because they end in ة (tāʾ marbūṭa) or ء (stand-alone hamza) respectively and don't get an alif. Note that ة ("hidden ت (tāʾ)") is pronounced /t/ with ʾiʿrāb. --Anatoli T. (обсудить/вклад) 01:26, 29 November 2014 (UTC)[reply]
Are there any words where the actual stem ends in a tāʾ marbūṭa? In the module I assumed they had stems ending in fatḥah + regular tāʾ (-at-). In any case, is this what you had in mind regarding the accusative singular and plural? —CodeCat 01:38, 29 November 2014 (UTC)[reply]
Thanks for creating this module. Your diff is close but not quite right. The final silent alif occurs only after the ending -an (and doesn't occur after ة or ء, as Anatoli mentioned). The underlying reason for this is that the spelling reflects the pausal pronunciation (i.e. the pronunciation when the word occurs at the end of an utterance), and in pausa the -an is pronounced as -ā in formal Arabic. As for your question, what do you mean when you ask if there are words where the stem ends in tāʾ marbūṭa? There are certainly words that end in tāʾ marbūṭa, and you can add case endings to those words. There aren't really any roots containing tāʾ marbūṭa in them -- it's always part of an ending (usually a feminine ending) added onto the root, but I'd say it's part of the stem. Benwing (talk) 05:34, 29 November 2014 (UTC)[reply]
OK, I think I understand what you are asking about. When tāʾ marbūṭa is followed by any ending that involves a written letter, it changes to regular tāʾ. This doesn't apply to the endings -a, -i, -u, -an, -in, -un, which involve only a diacritic, but it does apply to the dual endings -āni/-āyni. It looks like you're currently always changing tāʾ marbūṭa into tāʾ.
A couple more things: (1) the plural of مُدَرِّسَة (mudarrisa) is مُدَرِّسََات. The way this is inflected is nom indef -un, acc/gen indef -in, nom def/cons -u, acc/gen def/cons -i; (2) for the word كُتَّاب (kuttāb), the plural كَتَاتِيب (katātīb) is actually a diptote, meaning that its declension is slightly different: indefinite nom -u, indefinite acc/gen -a, otherwise same as for triptotes. Benwing (talk) 05:57, 29 November 2014 (UTC)[reply]
Yes, at مُدَرِّسَة (mudarrisa) plural indefinite nominative should be مُدَرِّسَاتٌ (mudarrisātun) and the endings as Benwing described. --Anatoli T. (обсудить/вклад) 09:15, 29 November 2014 (UTC)[reply]
Ok, I made some changes. Apparently I used the wrong code point for final tāʾ so it wasn't detecting it right. I also changed tāʾ marbūṭa to regular tāʾ before the alif that's added in the accusative singular, going by the rule that tāʾ marbūṭa only appears when only diacritic letters follow. —CodeCat 13:00, 29 November 2014 (UTC)[reply]
@CodeCat Sorry but you have misunderstood. The plural form for مُدَرِّسَة is مُدَرِّسَاتٌ (mudarrisātun) (nominative), not مُدَرِّسَاةٌ (mudarrisātun), genitive and accusative are مُدَرِّسَاتٍ (mudarrisātin). Note that tāʾ marbūṭa changes to tāʾ only in the plural form. Definite forms use tāʾ and -u/-i endings.
Singular. The accusative indefinite form needs to change to مُدَرِّسَةً (mudarrisatan). As mentioned before, words with final tāʾ marbūṭa and stand-alone hamza don't get an alif in the accusative but it's still pronounced "-an" and with tāʾ marbūṭa -tan (t + an). Plural feminine forms have a normal tāʾ. --Anatoli T. (обсудить/вклад) 13:45, 29 November 2014 (UTC)[reply]
So many ruuules @.@ - I hope I got it now. I removed the tāʾ marbūṭa and alif from the accusative, and changed it to regular tāʾ in the sound plural. What about the dual, though? I left the regular tāʾ in there. And nouns with long -āh? I think the problem is that the Wikipedia article gives everything only in transliterated form, so I'm having to reverse transliterate but it's apparently ambiguous. —CodeCat 13:55, 29 November 2014 (UTC)[reply]
Here's the paradigm for sound feminine declension
Indefinite singular:
  1. Nominative: مُدَرِّسَةٌ (mudarrisatun)
  2. Genitive: مُدَرِّسَةٍ (mudarrisatin)
  3. Accusative: مُدَرِّسَةً (mudarrisatan)
Definite singular forms have -u/-i/-a endings, omitting here
Indefinite plural:
  1. Nominative: مُدَرِّسَاتٌ (mudarrisātun)
  2. Genitive: مُدَرِّسَاتٍ (mudarrisātin)
  3. Accusative: مُدَرِّسَاتٍ (mudarrisātin)
Indefinite singular forms have -u/-i/-i endings, omitting here
Thanks for the efforts. Please let me know if it makes sense. Dual is correct. :) (for reference of feminine sound plural: A Reference Grammar of Modern Standard Arabic by Karin C. Ryding, p. 132) --Anatoli T. (обсудить/вклад) 13:59, 29 November 2014 (UTC)[reply]
I think it's difficult because I can't really read Arabic so I'm relying on the transliterations to tell me what I need. But they don't tell me each individual letter, so I have to guess, and guess wrong sometimes. (Now you understand, maybe, why I prefer transliterating orthographically rather than phonetically.) It would help a lot if you pointed out some details like when to change certain letters or which form of tāʾ to use, things like that. —CodeCat 14:02, 29 November 2014 (UTC)[reply]
Looks good! As Anatoli mentioned, the dual is done correctly. Sorry about the Wikipedia article, I did most of that and wrote it all in transcription. As for long -āh nouns, they should work the same as regular -ah nouns more or less. The singular will have اةٌ (ātun), اةً (ātan), etc. The dual will have اتَانِ (ātāni) etc. As for the plural, I'm not sure. The word مِرْآة (mirʔāh) with long -āh has broken plural forms. I'm thinking the sound plural will have -ayāt replacing the long -āh, since the long -āh underlying comes about through eliding a /y/ (or /w/) between short vowels. Benwing (talk) 14:06, 29 November 2014 (UTC)[reply]
BTW you are doing a great job. Thanks muchly for the work. I think you've gotten the hard parts of handling tāʾ marbūṭa. The only other issue with triptotes is that the silent alif in the accusative is omitted in one other particular case: words that end in -āʾ (written اء). Thus the example that Anatoli gave above with the accusative of حِذَاء (ḥiḏāʔ, (pair of) sandals), which is حِذَاءً (ḥiḏāʔan). This should be easy to check for. Benwing (talk) 14:14, 29 November 2014 (UTC)[reply]
Both tables at مدرسة now look correct. I have also added a declension table for "school" sense. CodeCat, you can try using SC Unipad from http://www.sharmahd.com/. It helps look up letter names, so you don't have to guess and you can type Arabic there as well. (100% graphical transliteration may not be helpful with Arabic as well. Silent ʾalif or tāʾ marbūṭa (in pausa) are examples you're already familiar with.) --Anatoli T. (обсудить/вклад) 14:17, 29 November 2014 (UTC)[reply]
OK, as for long -āh nouns, it looks like they can have plurals either in -ayāt or -awāt depending on what the underlying semivowel was that got elided. For example, حَيَاة (ḥayāh, life) has declension like this:
Indefinite singular:
  1. Nominative: حَيَاةٌ (ḥayātun)
  2. Genitive: حَيَاةٍ (ḥayātin)
  3. Accusative: حَيَاةً (ḥayātan)
Definite singular forms have -u/-i/-a endings, omitting here
Indefinite dual:
  1. Nominative: حَيَاتَانِ (ḥayātāni)
  2. Genitive/Accusative: حَيَاتَيْنِ (ḥayātayni)
Indefinite plural:
  1. Nominative: حَيَوَاتٌ (ḥayawātun)
  2. Genitive/Accusative: حَيَوَاتٍ (ḥayawātin)

Benwing (talk) 14:27, 29 November 2014 (UTC)[reply]

I see. So this is not predictable; such nouns have one of two possible sound plurals. This means that if you specify the plural stem other than -, it will think it's a broken plural. This could be made a special case, but I wonder if there are any broken plurals with a stem ending in -āt? If so, then such a special case would fail of course. And now that we're covering it, what about broken plurals with a stem in -ah, if they exist? Would they drop the alif in the accusative just like singulars with that ending do? —CodeCat 14:34, 29 November 2014 (UTC)[reply]
Yes, there are broken (or at least irregular) plurals ending in -āt. I gave an example way up above: طَرِيق (ṭarīq, road) has pl. either طُرُق (ṭuruq) or طُرُقَات (ṭuruqāt). The former inflects as a normal "broken plural" and the latter inflects as a "sound plural in -āt". You might think about checking the ending to figure out how to decline the word, but this will run into problems in that there are almost certainly plurals that happen to end in -āt but which are declined as broken plurals. (An example would be a plural of the form كُتَّاب (kuttāb, writers) but based on a 3-consonant root where the third consonant was a ت (t).) So you will need some way to explicitly specify the declension, either as a required element or as an override in cases where the auto-inflecter guesses wrong. As for broken plurals in -ah, they do exist and they do drop the alif in the accusative. One example is أُسْتَاذ (ʔustāḏ, teacher) pl. أَسَاتِذَة (ʔasātiḏa). In general, the "broken plural" declension is inflected exactly like singulars, in all the particulars. Benwing (talk) 14:55, 29 November 2014 (UTC)[reply]
Do those particulars include the distinction between diptotes, triptotes and so on?
Yes, there are diptote and triptote broken plurals. Diptote broken plurals are quite common in fact, e.g. the broken plural مَدَارِس (madāris, schools) is a diptote, as are all other broken plurals with the patterns CaCāCiC and CaCāCīC (among others). Benwing (talk) 15:14, 29 November 2014 (UTC)[reply]
The plural طُرُقَات (ṭuruqāt) confuses me a bit. Is it sound, broken, or both? —CodeCat 15:02, 29 November 2014 (UTC)[reply]
Hmmm, it's kind of both. It's a broken plural with a sound plural ending attached to it. For the purposes of inflection, it's an irregular sound plural because it inflects like other sound plurals in -āt. There are also irregular sound plurals in -ūn, although not very common; an example is سِنُون (sinūn, years), pl. of سَنَة (sana, year) (which also has the pl. سَنَوَات (sanawāt), inflected as a sound plural in -āt). Benwing (talk) 15:14, 29 November 2014 (UTC)[reply]
Oh dear. I don't think something like {{ar-decl-noun-triptote}} will cut it, then. After all, the plural could be diptote... or something else even. Please remind me that there are toddlers who speak this language? I don't think I would believe you. I think I understand why there were so many Arab mathematicians in the past; it takes a genius just to speak Arabic. @.@
Ok, done ranting. So, let's say plurals and singulars are completely separate nouns. Singulars obviously have a dual which plurals don't, but in every other respect, they both have the same possible declensions. Furthermore, there is not necessarily a correlation between the inflection of the singular and the plural; one could be triptote and the other diptote, one could be triptote and the other sound plural, one could be triptote and the other sound plural in -ah... So no matter what we do with the templates, we basically have to treat the singular and plural as completely separate things and there needs to be one parameter to say what kind of inflection the singular has, and the other what kind of inflection the plural has. I'm thinking of something like {{ar-decl-noun|1|singular stem|1|plural stem}} or {{ar-decl-noun|2|singular stem|7|plural stem}}, where the 1 corresponds to the number for the triptote inflection in the Wikipedia article, while 7 stands for sound masculine plural. Naturally, 7, 8 and 9 can not be used as singular types, but all other types can be used as both singular and plural. How is that? —CodeCat 15:26, 29 November 2014 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────

Yes, that sounds quite reasonable. One complication is that there can be more than one plural (sometimes 3 or 4), and each plural can have its own declension. So you will need to think how to encode that in parameters. Benwing (talk) 15:30, 29 November 2014 (UTC)[reply]
Maybe just more pairs? {{ar-decl-noun|1|x|2|y|3|z}}? —CodeCat 15:34, 29 November 2014 (UTC)[reply]
Sounds good to me. BTW there's nothing sacred about the numbering system in the Wikipedia article; it's just something I invented when creating the article to try and make sense of the different declensions out there. Normal grammars don't number the declensions, and in fact don't really even have consistent terminology for some of the declensions. So you can use any kind of system you want -- numbers, names, etc.
One additional complication I thought of is that in what I call declension class 5, there are two possible duals, either with -y- or -w-, depending on the particular word in question, as for the plurals of class -āh. I think (although I'm not sure) that the occurrence of -y- or -w- is actually predictable given the spelling, with the words that take -y- ending in ى and the words that take -w- ending in ا (both pronounced the same). In any case, don't worry about this issue for now, and if needed later on we can add an override parameter to specify the correct dual. Benwing (talk) 15:45, 29 November 2014 (UTC)[reply]
About to take a nap, back later. Benwing (talk) 16:00, 29 November 2014 (UTC)[reply]

Headword forms[edit]

I'm starting a new section here just to discuss the headword forms that we will be using. In some of the above sections, we were referring to them as "pausal" forms, but this is wrong. This becomes clear when looking at multiword expressions. Let me make a distinction between four different concepts and how certain expressions would be handled by them and why someone would use them as a headword:

  1. Indefinite nominative forms: كُرَةُ قَدَمٍ (kuratu qadamin), فِلْفِلٌ أَسْوَدُ (filfilun ʔaswadu), سَنَةٌ (sanatun), وَفَاةٌ (wafātun), مُسْتَشْفًى (mustašfan), بَانٍ (bānin)
    • by analogy to other languages with cases. This is what we have already decided not to use.
  2. Pausal forms: كُرَةُ قَدَم (kuratu qadam), فِلْفِلٌ أَسْوَد (filfilun ʔaswad), سَنَة (sana), وَفَاة (wafā), مُسْتَشْفَى (mustašfā), بَان (bān)
    • because this is how words would be pronounced in vocalized MSA when they occur by themselves.
  3. Caseless forms: كُرَة قَدَمٍ (kurat qadamin), فِلْفِل أَسْوَد (filfil ʔaswad), سَنَة (sanat), وَفَاة (wafāt), مُسْتَشْفًى (mustašfan), بَانٍ (bānin)
    • because this abstracts away all changes, while still indicating the distinction between an adjective and a muḍāf ʾilayhi.
  4. Idealized spoken forms: كُرَة قَدَم (kurat qadam), فِلْفِل أَسْوَد (filfil ʔaswad), سَنَة (sana), وَفَاة (wafāt), مُسْتَشْفَى (mustašfā), بَانِي (bānī)
    • because these are closer to the spoken language and therefore seem friendlier or more familiar.

Based on our previous discussions, it seems that what we are proposing is a combination of all four of the above. Since the idealized spoken forms are closest to what we want, I propose that we use them as lemmas exclusively. This would mean moving the -in lemmas to the spelling (which makes sense since it is only the indefinite forms that use the former spelling). This would leave one final question that I would be dissatisfied about: How can we differentiate between noun + adjective and ʾiḍāfa without creating redundant (and difficult-to-implement) declension tables? --WikiTiki89 21:02, 12 December 2014 (UTC)[reply]

I think we should use the idealized spoken forms except for -in/-an nouns, where we keep them in -in/-an form, and except for nouns in -اة, which should be indicated as -āh. This is current practice, it's the only way of indicating the appropriate declension of the various nouns, e.g. distinguishing مُسْتَشْفًى (mustašfan) from ذِكْرَى (ḏikrā), and it's also almost exactly in keeping with the way that Hans Wehr indicates things. That's 3 arguments in favor of what's already current practice and I feel fairly strongly about this. In general I feel strongly that we should follow existing dictionaries and encyclopedias unless there's a good reason not to. You will find, for example, the Arabic Wikipedia lists وَادٍ (wādin) under واد NOT #وادي. Other users expect words to follow certain standards that they've learned from dictionaries and we should follow those same standards. The standard that is current practice and that I think we should stay with is a compromise between spoken and written, leaning as much towards spoken as possible while still keeping the spelling consistently nominative singular indefinite, and while making sure to keep separate declensions distinct. Your proposal messes up the spelling by using construct spelling for وَادٍ (wādin)-type nouns. I really don't see the point of going against the grain of other dictionaries just for some imagined consistency (which necessarily creates inconsistency in other places). (Note, the only places we differ from Hans Wehr are (1) Wehr indicates nisba words as e.g. qawī whereas we have qawiyy, and (2) Wehr indicates diptotes with a superscript 2. I have no problem with either of these, and neither does Anatoli, but I haven't gone with the superscript 2 primarily because you've objected so much -- but keep in mind you're one against two in most of these arguments.) Benwing (talk) 08:38, 13 December 2014 (UTC)[reply]
I remember User:Mahmudmasri also objected to using -iyy vs -ī. I'm not sure why. I remember you mentioned that ـِي () is the colloquial pronunciation of nisba endings but ـِيّ (-iyy) is more standard. I also used -ī for some time but I agree we should switch to -iyy. --Anatoli T. (обсудить/вклад) 00:17, 18 December 2014 (UTC)[reply]
Using the word "colloquial" implies that I meant street speech, but it wasn't the case. the /i/ pronunciation is indeed standard and much more common. /ij/ is a bit archaic and poetic. --Mahmudmasri (talk) 16:54, 18 December 2014 (UTC)[reply]

Irregular definite forms[edit]

I was thinking it would be neater to display the definite form of اِمْرَأَة (imraʔa)اَلْمَرْأَة (al-marʔa) – in the headword rather than in the Usage notes. I can't recall any other nouns that have an irregular definite form like this, which is probably why there isn't a |def= parameter yet. Would someone be willing to add the functionality to the module? — Eru·tuon 01:52, 3 January 2017 (UTC)[reply]

Non-lemma forms[edit]

@Atitarev, Wikitiki89, Benwing2: I noticed several automatically categorized entries are listed in User:DTLHS/cleanup/lemma categorization, meaning that @NadandoBot found that they didn't belong to either Arabic lemmas or Arabic non-lemma forms. Several are singulatives: for instance, شَجَرَة (šajara) and بَيْضَة (bayḍa). Then there are others, like شَجَر (šajar) and بَيْض (bayḍ), that would be listed if it weren't for another headword on the same page. This I found out by previewing the etymology section that the noun belongs to.

So, ideally either the singulative or collective should be categorized as lemma, and the other as a non-lemma form. I vote for the collective, because it's the unmarked form morphologically. What do others think? Was there a discussion on this in the past? — Eru·tuon 06:20, 7 August 2017 (UTC)[reply]

@Erutuon Yes, collectives are better candidates for lemmas but singulatives can be classified as lemmas as well. --Anatoli T. (обсудить/вклад) 12:07, 7 August 2017 (UTC)[reply]
See Category:head tracking/unrecognized pos for a full list. —CodeCat 12:09, 7 August 2017 (UTC)[reply]
Singulatives and collectives should each be treated as full lemmas. --WikiTiki89 16:05, 7 August 2017 (UTC)[reply]
I don't know about that. In the Brythonic languages, singulatives are non-lemmas, they are forms of the plural. —CodeCat 16:22, 7 August 2017 (UTC)[reply]
@CodeCat: Sorry if that sounded like a general statement, but I only meant it to apply to Arabic. I know nothing about Brythonic languages, so for them it might make perfect sense to treat singulatives as non-lemmas. --WikiTiki89 16:44, 7 August 2017 (UTC)[reply]
I was thinking in the context of how Module:headword should treat it. I suppose if it recognises "singulative noun" as a lemma but "noun singulative form" as non-lemma, then it's fine. Arabic can then use the former while the Brythonic languages use the latter. In any case, Erutuon thinks that singulatives are non-lemmas in Arabic, so the point may be moot. —CodeCat 17:52, 7 August 2017 (UTC)[reply]
Why is that? The singulative appears to be derived from the collective, so I would have thought that the collective should be considered the lemma. — Eru·tuon 17:41, 7 August 2017 (UTC)[reply]
Because it's not a regular feature of all nouns. There is a singulative noun, and collective noun. You could see them as being forms of each other, but it would be a bit like saying that unfamiliar is the non-lemma negative form of familiar; it's true that many English adjectives have a "negative form" that is derived from the positive form by the addition of un-, but we still treat them as separate adjectives. --WikiTiki89 17:54, 7 August 2017 (UTC)[reply]
Singulatives are special though, because they fill a semantic gap. If a noun is inherently plural, then there is no way to express a single instance of such a thing. A singulative lets you do that. Essentially, a singulative is a case where the derivational relationship between singular and plural is reversed; the plural is more basic, and the singular is derived. —CodeCat 18:04, 7 August 2017 (UTC)[reply]
It's not as simple as you're making it. The singulative also has its own plural. شَجَر (šajar) means trees collectively, while شَجَرَة (šajara) means an individual tree and its plurals شَجَرَات (šajarāt) and أَشْجَار (ʔašjār) mean multiple individual trees. And it doesn't really fill a semantic gap any more than the English negative adjectives; you could theoretically say وَاحِدَةٌ مِنَ الشَّجَرِ (wāḥidatun mina š-šajari, literally one of the trees) instead of شَجَرَةٌ (šajaratun, a tree), just like you can say in English "an man who is not familiar" instead of "an unfamiliar man". --WikiTiki89 18:19, 7 August 2017 (UTC)[reply]
شَجَر (šajar) is still the most unmarked form of them all. Yeah, it does seem like there's a derivational relationship going from شَجَر (šajar) to شَجَرَة (šajara) to شَجَرَات (šajarāt), but evidently they are considered inflected forms of each other, as they are all listed in the same headword with labels having to do with grammatical number (singulative, collective, dual, paucal, plural). But if the relationship is really as derivational as that of familiar and unfamiliar, then they should be separated into their own entries, and each should not be shown in the other's headword and inflection table. — Eru·tuon 18:49, 7 August 2017 (UTC)[reply]
I disagree that being separate lemmas and being shown in each other's headword lines are mutually exclusive. We do this for Russian perfective and imperfective verb pairs, we do this for Hebrew active and passive verb pairs, etc. If anything, I would say that at the collective lemma, we need only list the singulative lemma, and not its duals and plurals. --WikiTiki89 19:03, 7 August 2017 (UTC)[reply]
Why is what you call the plural of the singulative called a paucal? That seems to be based on a single-lemma analysis. — Eru·tuon 19:13, 7 August 2017 (UTC)[reply]
Paucals exist even for nouns without a collective, although we often don't label is as such and just list is as one of several plurals. They are simply a plural that implies a relatively small number. --WikiTiki89 19:19, 7 August 2017 (UTC)[reply]
Yeah, well, the point is you're saying it should be called the plural of the singulative, not a paucal. — Eru·tuon 19:44, 7 August 2017 (UTC)[reply]
Well that depends if ideally we want to distinguish paucals or not. In Modern Arabic, paucals aren't distinguished from plurals anymore, but in Classical Arabic they were (and this applies both to nouns with a collective and those without). But for example, شَهْر (šahr, month) has no collective, but it has the paucal أَشْهُر (ʔašhur) and the plural شُهُور (šuhūr). --WikiTiki89 20:00, 7 August 2017 (UTC)[reply]
Okay, so not all paucals are morphologically equivalent to a plural formed from the singulative. What I said specifically applies to the form شَجَرَات (šajarāt), (or to any paucals formed the same way): either it is (in Classical Arabic) the plural of the singulative شَجَرَة (šajara), or it is the paucal of the collective شَجَر (šajar). Current analysis is the latter. It may be morphologically شَجَر (šajar)شَجَرَة (šajara)شَجَرَات (šajarāt), but that is not necessarily the correct grammatical analysis. — Eru·tuon 20:34, 7 August 2017 (UTC)[reply]
What makes you say that the current analysis is the latter? It's listed in the headwords of both. And I'm not saying anything about the direction of derivation, that is irrelevant here. Now what I am saying is that logically the paucal is a paucal of the singulative. Perhaps the reason this is confusing is that we really should be using the term "plural of paucity" rather than "paucal". A plural can only exist for a countable noun, and a collective noun is uncountable so it can't have a "plural of paucity" or any other plural (unless it is used in a countable sense, as is frequently done, in which case it can have a plural, e.g. when referring to a "type of tree"). --WikiTiki89 20:56, 7 August 2017 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Ahh, you're right. The current state is that both are considered coequal lemmas (if not categorized as such), or their headwords would not both list all the forms. I think that's a messy state of affairs that should be resolved.

As to the statement a collective form cannot have a plural because it is an uncountable noun, that is a confusion in terms. A collective form is not the same as an uncountable noun. Collective is the name for a particular form, not for the noun as a whole. And anyway, that argument would mean not only that شَجَرَتَان (šajaratān) and شَجَرَات (šajarāt) can't be a forms of شَجَر (šajar), but أَشْجَار (ʔašjār) can't either. So, we would have to list أَشْجَار (ʔašjār) under شَجَرَة (šajara), and بُيُوض (buyūḍ) under بَيْضَة (bayḍa), etc. — Eru·tuon 21:32, 7 August 2017 (UTC)[reply]

That's right, شَجَرَتَان (šajaratān) and شَجَرَات (šajarāt) are really only forms of شَجَرَة (šajara). As for أَشْجَار (ʔašjār) (which I mistakenly said above was a plural of the singulative, when really it's the plural of the collective), if you read my parenthetical remark in my previous post, it is the plural of the countable use of the noun to mean "type of tree". I didn't say that collective is the same thing as uncountable, I said that a collective noun must be uncountable. So to clarify, شَجَر (šajar) really has two senses, the first being a collective noun meaning "trees, collectively", the second being a countable noun meaning "type of tree" with the plural أَشْجَار (ʔašjār) (although that doesn't necessarily mean we need to explicitly list these two senses separately). --WikiTiki89 22:10, 7 August 2017 (UTC)[reply]
Yes, I failed to read your note regarding the "plural of type". So my argument fails. The meanings of the two plurals do seem to suggest that the singulative and collective are separate entities. What about other plurals of collective nouns? Do they also have the meaning "types of x"? For instance, does بُيُوض (buyūḍ) mean "types of eggs" (chicken, quail, ostrich, etc.), while بَِضَات (baiḍāt) means just "eggs"? — Eru·tuon 22:34, 7 August 2017 (UTC)[reply]
To clarify, بَيْضَات (bayḍāt) doesn't mean just "eggs", but "some number of individual eggs". Just "eggs" would be the collective بَيْض (bayḍ). Now as to your first question, yes, probably all collectives can have plurals meaning "types of x". But I'm not prepared to say that all plurals listed in the headwords of collective nouns are plurals of the collective form rather than of the singulative. Each one would have to be looked up individually to verify. --WikiTiki89 23:29, 7 August 2017 (UTC)[reply]
Well, it seems frustratingly complex if you have to decide on a case-by-case basis if a sound plural belongs to the singulative noun rather than the collective one, but perhaps it makes sense. — Eru·tuon 20:26, 10 August 2017 (UTC)[reply]
You're looking at it the wrong way (and misusing the term sound plural). The plural belongs to one or the other, but because the template doesn't distinguish between them, if someone had entered a "plural of multitude" of the singulative, then they would have entered it in the same place as the plural of the collective would be. That said, plurals in general are a very complex aspect of Arabic. Words often have many plurals, some of which are interchangeable, and some of which have certain connotations or nuances, while most dictionaries generally list them all equally without comment. --WikiTiki89 20:47, 10 August 2017 (UTC)[reply]
Ouch. I meant the opposite, broken plural. — Eru·tuon 20:49, 10 August 2017 (UTC)[reply]
Speaking generally (i.e. regarding all nouns, not just collectives and singulatives), you can often identify the singular when given a plural, but not with 100% certainty, because there are many exceptions. For example, fuʿūl is generally the plural of faʿl, ʾafʿāl is generally a plural of faʿal, while faʿla(t) and faʿala(t) generally have sound plurals. But this is not always the case. --WikiTiki89 20:57, 10 August 2017 (UTC)[reply]
Yes, I know. But the point I'm getting at is that you would say that أَشْجَار (ʔašjār) is the plural of شَجَر (šajar), apparently based on the semantics, but what of other other sound plurals that are related to collective–singulative noun pairs: must they be individually determined to be the plural of the singulative or the collective noun based on semantic criteria, or would you keep the existing vague situation in which the singulative and collective are both considered lemmas, but at the same time the associated plurals are unspecified as to which lemma they are a form of? — Eru·tuon 21:07, 10 August 2017 (UTC)[reply]
I think we need to rectify the current situation by listing أَشْجَار (ʔašjār) only at شَجَر (šajar) and شَجَرَات (šajarāt) only at شَجَرَة (šajara). But as for your point of using semantics to determine which one it is a plural of, we always have to do this for plurals in any language. Hypothetically, if the word doors referred to multiple instances of a chair, and chairs referred to multiple instances of a door, we would need to use semantics to determine that doors is the plural of chair and chairs is the plural of door. There is nothing different or unique here about that. --WikiTiki89 21:14, 10 August 2017 (UTC)[reply]
That's a silly example, because chair and door are not related in any way as the Arabic nouns are, aside from referring to human-made items, but you may be right. And I like to be able to see all the forms at a glance, but it would be an improvement to have the headwords match the categories. — Eru·tuon 21:26, 10 August 2017 (UTC)[reply]
I've fixed the issue at least for collectives and singulatives. Now their POS category is just "nouns", which is valid, and then the singulative and collective categories are added as secondary categories. —CodeCat 18:23, 10 August 2017 (UTC)[reply]
I notice most of the Arabic entries are now gone from Category:head tracking/unrecognized pos, but there's still a few. —CodeCat 18:25, 10 August 2017 (UTC)[reply]
@CodeCat: I fixed two of them. The other two: بانجو and برتقان are also the collective noun problem but with Egyptian Arabic. --WikiTiki89 18:31, 10 August 2017 (UTC)[reply]
And its module is a near-exact copy of the Arabic one, great. Could they perhaps be merged? —CodeCat 18:32, 10 August 2017 (UTC)[reply]
No idea. I really think we should treat Arabic dialects like we treat Chinese dialects. --WikiTiki89 18:34, 10 August 2017 (UTC)[reply]

Feminine-only adjectives[edit]

A note: حَسْنَاء (ḥasnāʔ) is apparently a feminine-only adjective, so {{ar-adj}} doesn't really work for it, as that requires the headword to be the masculine form. The entry currently uses {{ar-adj-fem}}, but that is intended for non-lemma forms. I might look at remedying this, perhaps by adding a parameter to {{ar-adj}} to indicate that the adjective is feminine-only, if someone else doesn't get to it first. — Eru·tuon 05:38, 19 January 2018 (UTC)[reply]

@Benwing2 Currently {{ar-noun}} categorizes feminine nouns ending in ـَاء and ـَا and ـَى as Arabic feminine terms lacking feminine ending, but these are feminine endings, see for example هَيْجَاء (hayjāʔ), هَيْجَا (hayjā), دَعْوَى (daʕwā) and أَرْطَى f (ʔarṭā) as distinct from أَرْطًى m (ʔarṭan). The default behaviour of Arabic is to treat words with these endings as feminines, and all are from Proto-Semitic and even Proto-Afro-Asiatic. See this paper The feminine endings *-ay and *-āy in Semitic and Berber. Also I do not know why there are non-lemmas like جَوْعَى (jawʕā) and أُولَى (ʔūlā) in the category, through non-lemma templates. Fay Freak (talk) 20:31, 20 October 2018 (UTC)[reply]

@Fay Freak Fixed both issues. Benwing2 (talk) 20:52, 20 October 2018 (UTC)[reply]

Template:ar-adj-pl displaying gender and number when it shouldn't[edit]

On ملاء, the template is displaying plural on the headword line. But the headword line should not repeat grammatical information that is already present in the definition, so can this be removed? I tried to do it myself but I haven't figured out how. —Rua (mew) 12:06, 28 April 2019 (UTC)[reply]