How the Qurantology Lexicon v6.0 was built — classical Arabic scholarship combined with computational linguistics to produce the most accurate Quranic vocabulary dataset available.
The Qurantology Lexicon is grounded in classical Arabic lexicography, drawing on the same primary sources Islamic scholars have used for over a millennium. Every entry is cross-referenced against multiple authoritative classical works before being finalised.
Our computational pipeline processes the Uthmanic rasm across all 114 surahs and 6,236 āyāt, producing a fully parsed, lemmatised, and semantically tagged dataset. Each token is reduced to its root, classified by grammatical form, and linked to every other occurrence in the Quran.
The remaining 1.3% of entries are in active review — published only once consensus is reached across all validation layers.
Major version updates are released annually and include new entries, corrections, and expanded semantic tagging. All users automatically benefit from updates — word definitions and entries are always the most current version. Version changelogs are published on the research blog.
Community scholars can submit corrections via any entry's "Suggest Correction" button. Submissions are reviewed within 14 days, and verified contributors are credited in the version changelog.