Overview

The Qurantology Lexicon is grounded in classical Arabic lexicography, drawing on the same primary sources Islamic scholars have used for over a millennium. Every entry is cross-referenced against multiple authoritative classical works before being finalised.

Our computational pipeline processes the Uthmanic rasm across all 114 surahs and 6,236 āyāt, producing a fully parsed, lemmatised, and semantically tagged dataset. Each token is reduced to its root, classified by grammatical form, and linked to every other occurrence in the Quran.

Five-Stage Research Pipeline
01
Corpus Extraction
Full tokenisation of the Uthmanic Quran corpus. Every word token is extracted, normalised, and catalogued with its surah, āyah, and position reference.
02
Morphological Analysis
Classical Arabic root extraction, pattern identification (وزن), and grammatical parsing — noun, verb, particle, pronoun — for every token in the corpus.
03
Semantic Classification
Ontological categorisation of each lemma across 47 semantic domains. Definitions sourced from Lisan Al-Arab, Al-Mufradat, and Taj Al-Arus.
04
Scholar Validation
Manual review by qualified Arabic linguists and Quranic scholars. Every entry requires sign-off from at least two independent reviewers before publication.
05
Computational Verification
Automated cross-validation against the Quranic Arabic Corpus (Leeds University) and Lane's Lexicon. Confidence scores calculated per entry.
Classical Sources
📜
Lisan Al-Arab — لسان العرب
Ibn Manzur's 14th century Arabic lexicon. The foundational reference for every entry's primary definition — the most comprehensive classical Arabic dictionary ever compiled.
📖
Al-Mufradat fi Gharib Al-Quran — المفردات
Al-Raghib Al-Isfahani's Quranic vocabulary dictionary. The most authoritative Quran-specific classical lexicon, providing contextual semantic definitions for every entry.
🔬
Quranic Arabic Corpus — University of Leeds
Dr. Kais Dukes' computational Quranic morphological corpus. Used for independent cross-validation of grammatical classifications and root assignments.
📚
Lane's Arabic-English Lexicon
Edward Lane's eight-volume 19th century lexicon. Used for secondary verification of classical definitions, particularly for rare and hapax legomenon entries.
Classification Accuracy
98.7%
Overall accuracy
Root identification
99.4%
Semantic classification
97.8%
Grammatical parsing
99.1%
Difficulty grading
96.3%
Frequency count
100%

The remaining 1.3% of entries are in active review — published only once consensus is reached across all validation layers.

Corpus Coverage
98.7%
Corpus Coverage
77,429
Word Tokens
10,702
Unique Lemmas
2,847
Distinct Roots (ج-ذ-ر)
47
Semantic Domains
114
Surahs Processed
6,236
Āyāt Analysed
v6.0
Current Version
Semantic Domains — Sample
☪️
Divine Attributes
284 entries — Names, epithets and qualities of Allah
🌍
Creation & Nature
1,247 entries — Celestial, earth, flora, fauna
🧠
Human Psychology
2,156 entries — Soul, emotion, consciousness
⚖️
Law & Justice
834 entries — Commands, rights, duties, contracts
🕌
Worship & Ritual
512 entries — Prayer, fasting, pilgrimage, zakat
📜
Narratives & Parables
1,893 entries — Prophetic stories, historical accounts
Sample Lexicon Entry
رَحْمَة
R · A · Ḥ · M · A · H
Mercy · Compassion · Divine Grace
Noun · Feminine Masdari Form Root: ر-ح-م Semantic: Divine Attributes Core Vocabulary
Quranic frequency339 occurrences
Morphological forms8 distinct forms
First occurrenceAl-Fatiha 1:1
Difficulty gradeBeginner · Core
Confidence score99.2%
Primary sourceAl-Mufradat
Version History & Updates

Major version updates are released annually and include new entries, corrections, and expanded semantic tagging. All users automatically benefit from updates — word definitions and entries are always the most current version. Version changelogs are published on the research blog.

Community scholars can submit corrections via any entry's "Suggest Correction" button. Submissions are reviewed within 14 days, and verified contributors are credited in the version changelog.