Printed Arabic text database for automatic recognition systems

Hassina Bouressace, J. Csirik

Research output: Contribution to conferencePaper

Abstract

Document image analysis and recognition are important topics in artificial intelligence as they are necessary for the retrieval of documents. Hence, the availability of a database with good script samples is a key requirement for machine-learning processes. Good printed text databases exist for Latin languages. However, there is a lack of databases with Arabic samples. This paper presents a new comprehensive database called PATD (Printed Arabic Text Database), which contains eight hundred and ten images scanned in grayscale format and different resolutions, leading to two thousand and nine hundred and fifty-four images (smartphone-captured images) under varying capture conditions (blurred, at different angles and in different light). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. The database is described in detail and it is intended for the research community.

Original languageEnglish
Pages107-111
Number of pages5
DOIs
Publication statusPublished - Jan 1 2019
Event5th International Conference on Computer and Technology Applications, ICCTA 2019 - Istanbul, Turkey
Duration: Apr 16 2019Apr 17 2019

Conference

Conference5th International Conference on Computer and Technology Applications, ICCTA 2019
CountryTurkey
CityIstanbul
Period4/16/194/17/19

Fingerprint

Image recognition
Smartphones
Image analysis
Artificial intelligence
Learning systems
Availability

Keywords

  • Arabic language
  • Arabic Printed Text Database
  • Arabic Text Recognition system
  • Database
  • Document images

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Bouressace, H., & Csirik, J. (2019). Printed Arabic text database for automatic recognition systems. 107-111. Paper presented at 5th International Conference on Computer and Technology Applications, ICCTA 2019, Istanbul, Turkey. https://doi.org/10.1145/3323933.3324082

Printed Arabic text database for automatic recognition systems. / Bouressace, Hassina; Csirik, J.

2019. 107-111 Paper presented at 5th International Conference on Computer and Technology Applications, ICCTA 2019, Istanbul, Turkey.

Research output: Contribution to conferencePaper

Bouressace, H & Csirik, J 2019, 'Printed Arabic text database for automatic recognition systems' Paper presented at 5th International Conference on Computer and Technology Applications, ICCTA 2019, Istanbul, Turkey, 4/16/19 - 4/17/19, pp. 107-111. https://doi.org/10.1145/3323933.3324082
Bouressace H, Csirik J. Printed Arabic text database for automatic recognition systems. 2019. Paper presented at 5th International Conference on Computer and Technology Applications, ICCTA 2019, Istanbul, Turkey. https://doi.org/10.1145/3323933.3324082
Bouressace, Hassina ; Csirik, J. / Printed Arabic text database for automatic recognition systems. Paper presented at 5th International Conference on Computer and Technology Applications, ICCTA 2019, Istanbul, Turkey.5 p.
@conference{cdcc6c7bb9c24b4c9ce4fff6debc806c,
title = "Printed Arabic text database for automatic recognition systems",
abstract = "Document image analysis and recognition are important topics in artificial intelligence as they are necessary for the retrieval of documents. Hence, the availability of a database with good script samples is a key requirement for machine-learning processes. Good printed text databases exist for Latin languages. However, there is a lack of databases with Arabic samples. This paper presents a new comprehensive database called PATD (Printed Arabic Text Database), which contains eight hundred and ten images scanned in grayscale format and different resolutions, leading to two thousand and nine hundred and fifty-four images (smartphone-captured images) under varying capture conditions (blurred, at different angles and in different light). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. The database is described in detail and it is intended for the research community.",
keywords = "Arabic language, Arabic Printed Text Database, Arabic Text Recognition system, Database, Document images",
author = "Hassina Bouressace and J. Csirik",
year = "2019",
month = "1",
day = "1",
doi = "10.1145/3323933.3324082",
language = "English",
pages = "107--111",
note = "5th International Conference on Computer and Technology Applications, ICCTA 2019 ; Conference date: 16-04-2019 Through 17-04-2019",

}

TY - CONF

T1 - Printed Arabic text database for automatic recognition systems

AU - Bouressace, Hassina

AU - Csirik, J.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Document image analysis and recognition are important topics in artificial intelligence as they are necessary for the retrieval of documents. Hence, the availability of a database with good script samples is a key requirement for machine-learning processes. Good printed text databases exist for Latin languages. However, there is a lack of databases with Arabic samples. This paper presents a new comprehensive database called PATD (Printed Arabic Text Database), which contains eight hundred and ten images scanned in grayscale format and different resolutions, leading to two thousand and nine hundred and fifty-four images (smartphone-captured images) under varying capture conditions (blurred, at different angles and in different light). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. The database is described in detail and it is intended for the research community.

AB - Document image analysis and recognition are important topics in artificial intelligence as they are necessary for the retrieval of documents. Hence, the availability of a database with good script samples is a key requirement for machine-learning processes. Good printed text databases exist for Latin languages. However, there is a lack of databases with Arabic samples. This paper presents a new comprehensive database called PATD (Printed Arabic Text Database), which contains eight hundred and ten images scanned in grayscale format and different resolutions, leading to two thousand and nine hundred and fifty-four images (smartphone-captured images) under varying capture conditions (blurred, at different angles and in different light). It is based on ten newspapers created with different structures, and an open-vocabulary, multi-font, multi-size and multi-style text. The database is described in detail and it is intended for the research community.

KW - Arabic language

KW - Arabic Printed Text Database

KW - Arabic Text Recognition system

KW - Database

KW - Document images

UR - http://www.scopus.com/inward/record.url?scp=85066778763&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066778763&partnerID=8YFLogxK

U2 - 10.1145/3323933.3324082

DO - 10.1145/3323933.3324082

M3 - Paper

SP - 107

EP - 111

ER -