OpenITI Receives $800,000 Grant from the Mellon Foundation for Persian and Arabic OCR OpenITI Receives $800,000 Grant from the Mellon Foundation for Persian and Arabic OCR | Matthew Thomas Miller
My Profile Photo

Matthew Thomas Miller


Assistant Professor of Persian Literature and Digital Humanities at Roshan Institute for Persian Studies, University of Maryland, College Park; Director, Roshan Initiative in Persian Digital Humanities; and an affiliate of the Maryland Institute for Technology in the Humanities


OpenITI Receives $800,000 Grant from the Mellon Foundation for Persian and Arabic OCR OpenITI Receives $800,000 Grant from the Mellon Foundation for Persian and Arabic OCR | Matthew Thomas Miller

OpenITI Receives $800,000 Grant from the Mellon Foundation for Persian and Arabic OCR

CorpusBuilder Screenshot

With generous funding from The Andrew W. Mellon Foundation, OpenITI AOCP will create a new digital text production pipeline for Persian and Arabic texts.

In June 2019 the Mellon Foundation generously awarded the University of Maryland, College Park (UMD) a $800,000 grant for the Open Islamicate Texts Initiative’s Arabic-script Optical Character Recognition Project (OpenITI AOCP).

The project is led by Matthew Thomas Miller (Roshan Institute for Persian Studies at UMD), Maxim Romanov (University of Vienna), Sarah Bowen Savant (Aga Khan University), David Smith (Northeastern University), and Raffaele Viglianti (Maryland Institute for Technology in the Humanities at UMD). SHARIAsource, a project of the Program in Islamic Law (PIL) at Harvard Law School (both led by Intisar Rabb), provided significant support for the initial technical infrastructure upon which this project will build (i.e., CorpusBuilder 1.0) and they will also play a leading role in the technical development portion of OpenITI AOCP.

OpenITI AOCP will catalyze the digitization of the Persian and Arabic written traditions by addressing the central technical and organizational impediments stymying the development of improved OCR for Arabic-script languages. Through a unique interdisciplinary collaboration between humanities scholars, computer scientists, developers, library scientists, and digital humanists, OpenITI AOCP will forge CorpusBuilder 1.0—an OCR pipeline and post-correction interface — into a user-friendly digital text production pipeline with a wide range of new OCR enhancements and expanded text export functionality. The project will also include a series of workshops, a full corpus development pilot, and a Persian and Arabic typeface inventory, all of which will inform the development of the technical components in important ways.

For full details on OpenITI AOCP, see the [project’s full overview(https://medium.com/@openiti/openiti-aocp-9802865a6586).

Other notable press coverage:

New Software to Digitize Persian and Arabic Materials

Mellon Foundation Awards $2.8M for Research, Digitization in Humanities (Also appeared reprinted here)

The Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project (OpenITI AOCP)

SHARIAsource Partners at UMD & OpenITI Receive $800k Mellon Grant to Create Arabic OCR Tool