Printed Arabic text database for automatic recognition systems

Hassina Bouressace, J. Csirik

Research output: Contribution to conferencePaper

2 Citations (Scopus)

Abstract

Document image analysis and recognition are important topics in artificial intelligence as they are necessary for the retrieval of documents. Hence, the availability of a database with good script samples is a key requirement for machine-learning processes. Good printed text databases exist for Latin languages. However, there is a lack of databases with Arabic samples. This paper presents a new comprehensive database called PATD (Printed Arabic Text Database), which contains eight hundred and ten images scanned in grayscale format and different resolutions, leading to two thousand and nine hundred and fifty-four images (smartphone-captured images) under varying capture conditions (blurred, at different angles and in different light). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. The database is described in detail and it is intended for the research community.

Original languageEnglish
Pages107-111
Number of pages5
DOIs
Publication statusPublished - Jan 1 2019
Event5th International Conference on Computer and Technology Applications, ICCTA 2019 - Istanbul, Turkey
Duration: Apr 16 2019Apr 17 2019

Conference

Conference5th International Conference on Computer and Technology Applications, ICCTA 2019
CountryTurkey
CityIstanbul
Period4/16/194/17/19

    Fingerprint

Keywords

  • Arabic language
  • Arabic Printed Text Database
  • Arabic Text Recognition system
  • Database
  • Document images

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Bouressace, H., & Csirik, J. (2019). Printed Arabic text database for automatic recognition systems. 107-111. Paper presented at 5th International Conference on Computer and Technology Applications, ICCTA 2019, Istanbul, Turkey. https://doi.org/10.1145/3323933.3324082