Broken Plurals in Arabic – very useful algorithm and step-by-step formula for forming most broken plurals in the Arabic language.

Broken Plurals

About Broken Plurals in Arabic

In Arabic, the broken plural /jamʿ taksīr/ is a word that is pluralized by changing the internal structure of the singular (ergo the term ‘broken’). Although there is only one form for the sound plural for each of the two genders, there are many broken plurals in the language.

Is There a Formula for Forming Broken Plurals in Arabic?

This tutorial deals with forming broken plurals. In classical methodologies, students of Arabic Grammar are merely asked to memorize the broken plurals for each new noun they encounter. To aid in this memorization, some helpful heuristics are given such as which plural patterns are typically used for which singular ones.

Here at Learn Arabic Online, however, we will use methodologies from generative linguistics developed by McCarthy and Prince (1990). The methodologies of generative linguistics were developed starting in the mid 20th century and are thus very new. These methodologies present concise algorithms which, although far from perfect, provide a means of going from a singular noun to its most common broken plural.

Disadvantages of the algorithm:

  • Only a broken plural will be returned, even if the sound is also usable. E.g. Jafna will yield Jifān and not Jafanāt (both usable, the latter being a greater plural and the former a lesser plural).
  • Not every broken plural is accounted for. The ones that are the most productive in the language, based on the dictionary, are the ones we consider.
  • The algorithm will give only the most appropriate plural, even though there may be many for a given singular.
  • The algorithm is not without memorization. Its purpose is simply to minimize the amount of memorization a student has to do in learning broken plurals.

 Whom is this Tutorial For?

This presentation is designed for advanced students of the language. By advanced we mean that a student must have enough experience studying the grammar that he/she is capable of putting the classical methodology to one side, picking up this new methodology, benefitting from it, and then continuing with the old methods. If a student can do this comfortably, this tutorial will be accessible.

Additionally, the following prerequisites are preferable:

  • comfort reading Arabic transliterated into English
  • comfort when talking about extremely abstract concepts
  • a basic understanding of the places of articulation for the Arabic letters
  • a basic understanding of grammar and morphology (esp. morphophonemic rules)

Background Concepts

The Templatic Tier

Consider letters as the most basic constituent of speech. Letters are either consonants (indicated by capital C) or vowels (indicated by capital V). There are six vowels in Arabic; three short: u, a, i, and three long: uu, aa, ii corresponding to the short ones.

When we parse a word alphabetically, we assign all consonants the symbol C and all vowels the symbol V. If we want to be more specific, we can use the actual vowel sound (u, a, i, uu, aa, or ii) instead of using V. If we want to emphasize that two consonants are the same, we subscript the C with a number.

Examples: the word “quutila” is alphabetically parsed as CVVCVCV or, less generally, as CuuCiCa. And the word “zalzala” can be parsed as C1VC2C1VC2V or as C1aC2C1aC2a.

The Syllabic Tier

Letters come together to form syllables. Syllables are grouped into one of three categories depending on their weight; light (L), heavy (H), or super-heavy (S). The inventory of syllables in Arabic is as follows:

 

Syllable

Weight

 

Example

 

CV

Light

L

أَ

 

CVV

Heavy

H

ها

 

CVC

Heavy

H

إِنْ

Rare

CVCC

Super-heavy

S

ضَرْبْ

CVVC

Super-heavy

S

قالْ

CVVCC

Super-heavy

S

جانّ

The Prosodic Tier

We start parsing words from the left and look for one or two syllables in order to form feet. Consider two feet: the iamb and the moraic trochee. An iamb is a light syllable followed by a heavy, and a trochee is two lights or one heavy.

Iamb:                    LH                           CVCVV or CVCVC
Trochee:              LL or H                   CVCV, CVV, or CVC

Arabic is a trochee-based language, which means we parse words and look for trochees; whatever syllables do not form trochees are left as residue.

Examples: “kataba” à CVCVCV. Starting from the left, CV is not a foot. CVCV together, however, forms a trochee (LL). The final CV cannot form a foot so “ba” is left as residue.

“nafs” à CVCC. CVC forms a trochee (H) and the final “s” is residue. Similarly in “rajul” we get CVCVC, so “raju” is a trochee (LL) and “l” is residue.

Motivation

The traditional way to look at a word is to consider it as a one-dimensional string. This, however, limits our understanding of how a word changes from one form to another; we end up saying that the base letters move from one pattern to another.

Generative linguistics, however, looks at words from a multi-dimensional perspective, considering its templatic, syllabic, prosodic, etc tiers. When a word moves from one form to another, the change might not be occurring on the face of the word; it might be occurring on one of these other dimensions. With generative linguistics, we have tools that allow us to look at these other dimensions and analyze how words actually change.

Broken Plural Patterns

The figures that follow are based on McCarthy and Prince’s (1990) survey, and are approximates.

There are over 70 patterns for the broken plural, but save 31 of them the rest are completely negligible; one can read entire libraries without ever coming across one of them.

These 31 (which include the metathesized versions; see below) are divided into four groups as follows.

Group A

Group B

Group C

Group D

CuC1C1aC

CuCC

CiCaC

CiCaaC

Cu C1C1aaC

CiCC + at

CiCaC + at

CiCaaC + at2

 

CiCC + aan

CaCaC2

CuCuuC

 

CuCC + aan

CaCaC + at

CuCuuC + at2

 

CaCC1

CuCaC

CaCaaC1

 

 

CuCaC + at

CaCiiC2

 

 

CuCaC + aa’

CawaaCiC

 

 

CaCuC3

CaCaa’iC

 

 

CaCiC + at4

CaCaaCiC

 

 

CaCiC + aa’4

CaCaaCiiC

 

 

CuCuC

 

1 May metathesize to ‘aCCaaC, may also change due to glide phonology
2 Rare
3 May metathesize to ‘aCCuC
4 May metathesize to ‘aCCiC

Group A

These plurals are used exclusively for the active participle on the pattern CaaCiC and no other nouns.

Group B

These patterns are quite rare; they account for the plurals of only 4% of all singulars.

The pattern CuCC is mostly reserved as the plural of adjectives for colours and bodily defects (whether masculine or feminine). It does occur elsewhere, as in “fulk”.

Group C

The patterns in this group are not rare, but there is no known algorithm that governs them all.

Group D

This group contains the most productive plural patterns in the language, by far. There is a single algorithm that takes a noun and returns its plural if that plural is among this group.

To get an idea of just how productive this group is, consider all trilateral nouns. They fall into one of the following templates: CVCC, CVCVC, CVCVVC, or CVVCVC. With optional feminine suffix, this makes eight pattern groups. All singulars with three base letters fall into one of these eight mentioned patterns. The fact that the vast majority of them use group D pluralisation is evident from the following figures:

  • 83% of CVCC singulars use group D pluralisation
  • 81% of CVCVC singulars use it
  • 88% of CVCVVC+at singular use it
  • 97% of CVVCVC+at singulars use it
  • group D is also significant for the CVCC+at, CVCVC+at, and CVVCVC singulars
  • group D is insignificant only for CVCVVC singulars (with 8% productivity)
  • 25-30% of all singulars with more than three letters (base or otherwise; not including long vowels) use group D pluralisation

A Probabilistic Pluralisation Algorithm

Given a singular noun, the following algorithm will return its most appropriate broken plural from group D. If the word happens not to use pluralisation patterns from group D, this algorithm will still return a plural, but it will obviously be incorrect (false positive).

1.       if the singular has a round Taa at the end, remove it

2.       look for the first trochee starting from the left edge and map it to a CVCVV iamb

a.       mapping is done by extending the vowels to fill the new vowel positions

b.      if there aren’t enough consonants, use “w” as a filler

3.       join the new iamb to the residue, keeping in mind morphophonemic rules

4.       change the vowel pattern to match the appropriate plural pattern

a.       which vowels are used needs to be memorized, but often “a, i” is used

b.      consult the examples to learn how to distribute these vowels

5.       perform metathesis if necessary

a.       which nouns undergo metathesis is something that needs to be memorized

6.       if the new plural is on CiCaaC or CuCuuC, it may require a feminine Taa at its end

a.       whether a noun needs this or not must be memorized or learned with experience

Examples

The algorithm will make little without accompanying examples:

نَفْس
(soul)

  1. “naf” is the first trochee, “s” is residue
  2. “naf” transformed into an iamb becomes “nafaa” and the word becomes “nafaas”
  3. this word takes the “u” vowel pattern so it becomes “nufuus”

قِدْح
(arrow)

  1. “qid” is the first trochee, “h” is residue
  2. “qid” transformed into an iamb becomes “qidii” and the complete word is “qidiih”
  3. this word takes the “i, a” vowel pattern so it becomes “qidaah”

حُكْم
(judgement)

  1. “huk” is the first trochee, “m” is residue
  2. “huk” transformed into an iamb becomes “hukuu” and the complete word is “hukuum”
  3. this word takes the “a” vowel pattern so it becomes “hakaam”
  4. metathesis applies, changing “hakaam” to “ahkaam”

أَسَد
(lion)

  1. “asa” is the first trochee, “d” is residue
  2. “asa” transformed into an iamb becomes “asaa” and the word becomes “asaad”
  3. this word takes the “u” vowel so it becomes “usuud”

رَجُل
(man)

  1. “raju” is the first trochee, “l” is residue
  2. “raju” transformed into an iamb becomes “rajuu” and the complete word is “rajuul”
  3. this word takes the “i, a” vowel pattern so it becomes “rijaal”

عِنَب
(grape)

  1. “cina” is the first trochee, “b” is residue
  2. “cina” transformed into an iamb becomes “cinaa” and the complete word is “cinaab”
  3. this word takes the “a” vowel pattern so it becomes “canaab”
  4. metathesis applies, changing “canaab” to “acnaab”

سَحابة
(cloud)

  1. removing the Taa gives “sahaab”
  2. “saha” is the first trochee, “ab” is residue
  3. “saha” transformed into an iamb becomes “sahaa” and the complete word should be “sahaa-ab”. But based on morphophonemic rules, there must be an eliding Hamza to facilitate this. This gives “sahaa’ab”
  4. this word takes the “a, i” vowel pattern so it becomes “sahaa’ib”

جزيرة
(peninsula)

  1. removing the Taa gives “jaziir”
  2. “jazi” is the first trochee, “ir” is residue
  3. “jazi” transformed into an iamb becomes “jazii” and the complete word should be “jazii-ir”. But there must be an eliding Hamza to facilitate this. This gives “jazii’ir”
  4. this word takes the “a, i” vowel pattern so it becomes “jazaa’ir”

كريمة
(noble)

  1. removing the Taa gives “kariim”
  2. “kari” is the first trochee, “im” is residue
  3. “kari” transformed into an iamb becomes “karii” and the complete word should be “karii-im”. But there must be an eliding Hamza to facilitate this. This gives “karii’im”
  4. this word takes the “a, i” vowel pattern so it becomes “karaa’im”

فاكِهة
(fruit)

  1. removing the Taa gives “faakih”
  2. “faa” is the first trochee, “kih” is the residue
  3. “faa” only has one consonant so we compensate with a “w” and map to an iamb. This gives “fawaa”. The entire word is “fawaakih”
  4. like most plurals, this take the “a, i” vowel pattern so it remains “fawaakih”

آنِسة
(cheerful)

  1. removing the Taa gives “aanis”
  2. “aa” is the first trochee and “nis” is the residue
  3. “aa” has only one consonant (initial Hamza) so it maps to an iamb as “awaa”. The entire word is then “awaanis”
  4. this plural takes the “a, i” vowel pattern so it remains “awaanis”

جاموس
(buffalo)

  1. “jaa” is the first trochee and “muus” is the residue
  2. “jaa” mapped to an iamb becomes “jawaa” and the entire word is “jawaamuus”
  3. this plural takes pattern “a, i” resulting in “jawaamiis”

سُلْطان
(sultan)

  1. “sul” is the first trochee and “taan” is residue
  2. “sul” maps to “suluu” and the word becomes “suluutaan”
  3. this plural uses the “a, i” vowel pattern to give “salaatiin”

تِنّين
(sea monster)

  1. “tin” is the first trochee and “niin” is the residue
  2. “tin” maps to “tinii” and the word is “tiniiniin”
  3. this plural uses the “a, i” vowel pattern which gives “tanaaniin”

أَفْراق
(sects)

This is already the plural of “firaq”, which is the plural of “firqa”. We can turn this into a level-3 plural as follows:

  1. “af” is the first trochee (remember the Hamza at the beginning), “raaq” is the residue
  2. “af” maps to “afaa” and the word becomes “afaaraaq”
  3. this plural uses the “a, i” pattern which gives “afaariiq”

Abstract Examples

In the above examples, we were feeding individual words to this algorithm. But it stands to reason that singulars on the same pattern will usually yield the same types of plurals. When it comes to static nouns and gerunds, this is of course not always the case because of the vowel patterns (and other aspects of the algorithm that require memorization), but derived nouns are very well behaved.

So instead of feeding it words, let’s feed the algorithm entire templates.

مِفْعَل
(noun of usage)

  1. the first trochee is “mif” and “cal” is residual
  2. “mif” maps to “mifii” and the word becomes “mifiical”
  3. like almost all derived nouns, this takes vowel pattern “a, i” to give “mafaacil”

مِفْعَلة
(noun of usage)

  1. removing the Taa we get “mifcal”
  2. the first trochee is “mif” and “cal” is residual
  3. “mif” maps to “mifii” and the word becomes “mifiical”
  4. like almost all derived nouns, this takes vowel pattern “a, i” to give “mafaacil”

مِفْعال
(noun of usage)

  1. the first trochee is “mif” and “caal” is residual
  2. “mif” maps to “mifii” and the word becomes “mifiicaal”
  3. like almost all derived nouns, this takes vowel pattern “a, i” to give “mafaaciil”

Incredible! These are the plurals medieval Arab grammarians teach us to associate with these singulars. Yet one algorithm gives us all of them at once despite their varying singular forms. Below are more examples.

أَفْعَل
(superlative)

  1. the first trochee is “af” and “cal” is residual
  2. “af” maps to “afaa” and the word becomes “afaacal”
  3. using the vowel pattern “a, i” gives us “afaacil”

مَفْعول
(passive participle)

  1. the first trochee is “maf” and “cuul” is residual
  2. “maf” maps to “mafaa” and the word becomes “mafaa cuul”
  3. applying the “a, i” vowels we get “mafaa ciil”

مَفْعَل
(locative noun)

  1. the first trochee is “maf” and “cal” is residual
  2. “maf” maps to “mafaa” and the word becomes “mafaa cal”
  3. applying the “a, i” vowels we get “mafaa cil”

مَفْعِل
(locative noun)

  1. the first trochee is “maf” and “cil” is residual
  2. “maf” maps to “mafaa” and the word becomes “mafaa cil”
  3. applying the “a, i” vowels, the word remains “mafaa cil”

  • Proceed to the next lesson: Arabic Numbers
  • Go to the home page: Learn Arabic Online
  • Click here to subscribe