Measuring Lexical Richness through Type-Token Curve: a Corpus-Based Analysis of Arabic and English Texts

Khalid Shakir Hussein


WordSmith Tools (5.0) is used to analyze samples from texts of different genres written by eight different authors. These texts are grouped into two corpora: Arabic and English. The Arabic corpus includes textual samples from the Qur'an, Al-Sahifa al-Sajjadiyya ( a prayer manual), Modern Standard Arabic (Mistaghanmy's novel Chaos of Sensations) and Imam Ali's Nahjul-Balagah (Peak of Eloquence). The English Corpus comprises The New Testament, Conrad's Heart of Darkness, Dickens' David Copperfield, and Eliot's Adam Bede. Each textual sample is statistically analyzed to find about its lexical richness or vocabulary size. The number of tokens (total number of words) and the number of types (distinct vocabulary words) are counted for each sample. Then both numbers are plotted against each other using Microsoft Office Excel diagrams. The resulted curves in both corpora give a vivid idea about the lexical richness of each textual sample. They open an active avenue to compare between the different authors in terms of their vocabulary size and the range at which they begin to exhaust their linguistic repertoire by repetition. The curves for Imam Ali's Nahjul-Balagah (Arabic corpus) and Conrad's Heart of Darkness (English corpus) rise up high reaching the maximum. By contrast, Qur'anic Verses and The New Testament have the lowest curve for the ritualistic quality of their texts.

Keywords: corpus stylistics, type-token curve, lexical richness

Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email:

ISSN (Paper)2224-5766 ISSN (Online)2225-0484

Please add our address "" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Copyright ©