Broken Plurals in Arabic – very useful algorithm and step-by-step formula for forming most broken plurals in the Arabic language.
About Broken Plurals in Arabic
In Arabic, the broken plural /jamʿ taksīr/ is a word that is pluralized by changing the internal structure of the singular (ergo the term ‘broken’). Although there is only one form for the sound plural for each of the two genders, there are many broken plurals in the language.
Table of Contents
Is There a Formula for Forming Broken Plurals in Arabic?
This tutorial deals with forming broken plurals. In classical methodologies, students of Arabic Grammar are merely asked to memorize the broken plurals for each new noun they encounter. To aid in this memorization, some helpful heuristics are given such as which plural patterns are typically used for which singular ones.
Here at Learn Arabic Online, however, we will use methodologies from generative linguistics developed by McCarthy and Prince (1990). The methodologies of generative linguistics were developed starting in the mid 20th century and are thus very new. These methodologies present concise algorithms which, although far from perfect, provide a means of going from a singular noun to its most common broken plural.
Disadvantages of the algorithm:
- Only a broken plural will be returned, even if the sound is also usable. E.g. Jafna will yield Jifān and not Jafanāt (both usable, the latter being a greater plural and the former a lesser plural).
- Not every broken plural is accounted for. The ones that are the most productive in the language, based on the dictionary, are the ones we consider.
- The algorithm will give only the most appropriate plural, even though there may be many for a given singular.
- The algorithm is not without memorization. Its purpose is simply to minimize the amount of memorization a student has to do in learning broken plurals.
Whom is this Tutorial For?
This presentation is designed for advanced students of the language. By advanced we mean that a student must have enough experience studying the grammar that he/she is capable of putting the classical methodology to one side, picking up this new methodology, benefitting from it, and then continuing with the old methods. If a student can do this comfortably, this tutorial will be accessible.
Additionally, the following prerequisites are preferable:
- comfort reading Arabic transliterated into English
- comfort when talking about extremely abstract concepts
- a basic understanding of the places of articulation for the Arabic letters
- a basic understanding of grammar and morphology (esp. morphophonemic rules)
Background Concepts
The Templatic Tier
Consider letters as the most basic constituent of speech. Letters are either consonants (indicated by capital C) or vowels (indicated by capital V). There are six vowels in Arabic; three short: u, a, i, and three long: uu, aa, ii corresponding to the short ones.
When we parse a word alphabetically, we assign all consonants the symbol C and all vowels the symbol V. If we want to be more specific, we can use the actual vowel sound (u, a, i, uu, aa, or ii) instead of using V. If we want to emphasize that two consonants are the same, we subscript the C with a number.
Examples: the word “quutila” is alphabetically parsed as CVVCVCV or, less generally, as CuuCiCa. And the word “zalzala” can be parsed as C1VC2C1VC2V or as C1aC2C1aC2a.
The Syllabic Tier
Letters come together to form syllables. Syllables are grouped into one of three categories depending on their weight; light (L), heavy (H), or super-heavy (S). The inventory of syllables in Arabic is as follows:
Syllable | Weight | Example | ||
CV | Light | L | أَ | |
CVV | Heavy | H | ها | |
CVC | Heavy | H | إِنْ | |
Rare | CVCC | Super-heavy | S | ضَرْبْ |
CVVC | Super-heavy | S | قالْ | |
CVVCC | Super-heavy | S | جانّ |
The Prosodic Tier
We start parsing words from the left and look for one or two syllables in order to form feet. Consider two feet: the iamb and the moraic trochee. An iamb is a light syllable followed by a heavy, and a trochee is two lights or one heavy.
Iamb: LH CVCVV or CVCVC
Trochee: LL or H CVCV, CVV, or CVC
Arabic is a trochee-based language, which means we parse words and look for trochees; whatever syllables do not form trochees are left as residue.
Examples: “kataba” à CVCVCV. Starting from the left, CV is not a foot. CVCV together, however, forms a trochee (LL). The final CV cannot form a foot so “ba” is left as residue.
“nafs” à CVCC. CVC forms a trochee (H) and the final “s” is residue. Similarly in “rajul” we get CVCVC, so “raju” is a trochee (LL) and “l” is residue.
Motivation
The traditional way to look at a word is to consider it as a one-dimensional string. This, however, limits our understanding of how a word changes from one form to another; we end up saying that the base letters move from one pattern to another.
Generative linguistics, however, looks at words from a multi-dimensional perspective, considering its templatic, syllabic, prosodic, etc tiers. When a word moves from one form to another, the change might not be occurring on the face of the word; it might be occurring on one of these other dimensions. With generative linguistics, we have tools that allow us to look at these other dimensions and analyze how words actually change.
Broken Plural Patterns
The figures that follow are based on McCarthy and Prince’s (1990) survey, and are approximates.
There are over 70 patterns for the broken plural, but save 31 of them the rest are completely negligible; one can read entire libraries without ever coming across one of them.
These 31 (which include the metathesized versions; see below) are divided into four groups as follows.
Group A | Group B | Group C | Group D |
CuC1C1aC | CuCC | CiCaC | CiCaaC |
Cu C1C1aaC | CiCC + at | CiCaC + at | CiCaaC + at2 |
CiCC + aan | CaCaC2 | CuCuuC | |
CuCC + aan | CaCaC + at | CuCuuC + at2 | |
CaCC1 | CuCaC | CaCaaC1 | |
CuCaC + at | CaCiiC2 | ||
CuCaC + aa’ | CawaaCiC | ||
CaCuC3 | CaCaa’iC | ||
CaCiC + at4 | CaCaaCiC | ||
CaCiC + aa’4 | CaCaaCiiC | ||
CuCuC |
1
May metathesize to ‘aCCaaC, may also change due to glide phonology
2 Rare
3 May metathesize to ‘aCCuC
4 May metathesize to ‘aCCiC
Group A
These plurals are used exclusively for the active participle on the pattern CaaCiC and no other nouns.
Group B
These patterns are quite rare; they account for the plurals of only 4% of all singulars.
The pattern CuCC is mostly reserved as the plural of adjectives for colours and bodily defects (whether masculine or feminine). It does occur elsewhere, as in “fulk”.
Group C
The patterns in this group are not rare, but there is no known algorithm that governs them all.
Group D
This group contains the most productive plural patterns in the language, by far. There is a single algorithm that takes a noun and returns its plural if that plural is among this group.
To get an idea of just how productive this group is, consider all trilateral nouns. They fall into one of the following templates: CVCC, CVCVC, CVCVVC, or CVVCVC. With optional feminine suffix, this makes eight pattern groups. All singulars with three base letters fall into one of these eight mentioned patterns. The fact that the vast majority of them use group D pluralisation is evident from the following figures:
- 83% of CVCC singulars use group D pluralisation
- 81% of CVCVC singulars use it
- 88% of CVCVVC+at singular use it
- 97% of CVVCVC+at singulars use it
- group D is also significant for the CVCC+at, CVCVC+at, and CVVCVC singulars
- group D is insignificant only for CVCVVC singulars (with 8% productivity)
- 25-30% of all singulars with more than three letters (base or otherwise; not including long vowels) use group D pluralisation
A Probabilistic Pluralisation Algorithm
Given a singular noun, the following algorithm will return its most appropriate broken plural from group D. If the word happens not to use pluralisation patterns from group D, this algorithm will still return a plural, but it will obviously be incorrect (false positive).
1. if the singular has a round Taa at the end, remove it
2. look for the first trochee starting from the left edge and map it to a CVCVV iamb
a. mapping is done by extending the vowels to fill the new vowel positions
b. if there aren’t enough consonants, use “w” as a filler
3. join the new iamb to the residue, keeping in mind morphophonemic rules
4. change the vowel pattern to match the appropriate plural pattern
a. which vowels are used needs to be memorized, but often “a, i” is used
b. consult the examples to learn how to distribute these vowels
5. perform metathesis if necessary
a. which nouns undergo metathesis is something that needs to be memorized
6. if the new plural is on CiCaaC or CuCuuC, it may require a feminine Taa at its end
a. whether a noun needs this or not must be memorized or learned with experience
Examples
The algorithm will make little without accompanying examples:
نَفْس |
|
قِدْح |
|
حُكْم |
|
أَسَد |
|
رَجُل |
|
عِنَب |
|
سَحابة |
|
جزيرة |
|
كريمة |
|
فاكِهة |
|
آنِسة |
|
جاموس |
|
سُلْطان |
|
تِنّين |
|
أَفْراق | This
is already the plural of “firaq”, which is the plural of “firqa”. We can turn
this into a level-3 plural as follows:
|
Abstract Examples
In the above examples, we were feeding individual words to this algorithm. But it stands to reason that singulars on the same pattern will usually yield the same types of plurals. When it comes to static nouns and gerunds, this is of course not always the case because of the vowel patterns (and other aspects of the algorithm that require memorization), but derived nouns are very well behaved.
So instead of feeding it words, let’s feed the algorithm entire templates.
مِفْعَل |
|
مِفْعَلة |
|
مِفْعال |
|
Incredible! These are the plurals medieval Arab grammarians teach us to associate with these singulars. Yet one algorithm gives us all of them at once despite their varying singular forms. Below are more examples.
أَفْعَل |
|
مَفْعول |
|
مَفْعَل |
|
مَفْعِل |
|