Compacting XML documents

Miklós Kálmán, Ferenc Havasi, T. Gyimóthy

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Nowadays, one of the most common formats for storing information is XML. The biggest drawback of XML documents is that their size is rather large compared to the information they store. XML documents may contain redundant attributes, which can be calculated from others. These redundant attributes can be deleted from the original XML document if the calculation rules can be stored somehow. In an Attribute Grammar environment there is an analog description for these rules: semantic rules. In order to use this technique in an XML environment we defined a new metalanguage called SRML. We have developed a method, which enables us to use this SRML metalanguage for compacting XML documents. After compaction it is possible to use XML compressors to make the compacted document much smaller. By using this combined approach we could achieve a significant size reduction compared to the compressed size of the XML specific compressors. This article extends the method published earlier to provide the possibility of automatically generating rules using machine learning techniques, with which it can find relationships between attributes which might not have been noticed by the user beforehand.

Original languageEnglish
Pages (from-to)90-106
Number of pages17
JournalInformation and Software Technology
Volume48
Issue number2
DOIs
Publication statusPublished - Feb 2006

Fingerprint

XML
Compressors
Learning systems
Compaction
Semantics

Keywords

  • SRML
  • XML
  • XML compaction
  • XML semantics

ASJC Scopus subject areas

  • Information Systems
  • Software

Cite this

Compacting XML documents. / Kálmán, Miklós; Havasi, Ferenc; Gyimóthy, T.

In: Information and Software Technology, Vol. 48, No. 2, 02.2006, p. 90-106.

Research output: Contribution to journalArticle

Kálmán, Miklós ; Havasi, Ferenc ; Gyimóthy, T. / Compacting XML documents. In: Information and Software Technology. 2006 ; Vol. 48, No. 2. pp. 90-106.
@article{dddfe01d450f4d3b945fc6ee93981d6e,
title = "Compacting XML documents",
abstract = "Nowadays, one of the most common formats for storing information is XML. The biggest drawback of XML documents is that their size is rather large compared to the information they store. XML documents may contain redundant attributes, which can be calculated from others. These redundant attributes can be deleted from the original XML document if the calculation rules can be stored somehow. In an Attribute Grammar environment there is an analog description for these rules: semantic rules. In order to use this technique in an XML environment we defined a new metalanguage called SRML. We have developed a method, which enables us to use this SRML metalanguage for compacting XML documents. After compaction it is possible to use XML compressors to make the compacted document much smaller. By using this combined approach we could achieve a significant size reduction compared to the compressed size of the XML specific compressors. This article extends the method published earlier to provide the possibility of automatically generating rules using machine learning techniques, with which it can find relationships between attributes which might not have been noticed by the user beforehand.",
keywords = "SRML, XML, XML compaction, XML semantics",
author = "Mikl{\'o}s K{\'a}lm{\'a}n and Ferenc Havasi and T. Gyim{\'o}thy",
year = "2006",
month = "2",
doi = "10.1016/j.infsof.2005.03.001",
language = "English",
volume = "48",
pages = "90--106",
journal = "Information and Software Technology",
issn = "0950-5849",
publisher = "Elsevier",
number = "2",

}

TY - JOUR

T1 - Compacting XML documents

AU - Kálmán, Miklós

AU - Havasi, Ferenc

AU - Gyimóthy, T.

PY - 2006/2

Y1 - 2006/2

N2 - Nowadays, one of the most common formats for storing information is XML. The biggest drawback of XML documents is that their size is rather large compared to the information they store. XML documents may contain redundant attributes, which can be calculated from others. These redundant attributes can be deleted from the original XML document if the calculation rules can be stored somehow. In an Attribute Grammar environment there is an analog description for these rules: semantic rules. In order to use this technique in an XML environment we defined a new metalanguage called SRML. We have developed a method, which enables us to use this SRML metalanguage for compacting XML documents. After compaction it is possible to use XML compressors to make the compacted document much smaller. By using this combined approach we could achieve a significant size reduction compared to the compressed size of the XML specific compressors. This article extends the method published earlier to provide the possibility of automatically generating rules using machine learning techniques, with which it can find relationships between attributes which might not have been noticed by the user beforehand.

AB - Nowadays, one of the most common formats for storing information is XML. The biggest drawback of XML documents is that their size is rather large compared to the information they store. XML documents may contain redundant attributes, which can be calculated from others. These redundant attributes can be deleted from the original XML document if the calculation rules can be stored somehow. In an Attribute Grammar environment there is an analog description for these rules: semantic rules. In order to use this technique in an XML environment we defined a new metalanguage called SRML. We have developed a method, which enables us to use this SRML metalanguage for compacting XML documents. After compaction it is possible to use XML compressors to make the compacted document much smaller. By using this combined approach we could achieve a significant size reduction compared to the compressed size of the XML specific compressors. This article extends the method published earlier to provide the possibility of automatically generating rules using machine learning techniques, with which it can find relationships between attributes which might not have been noticed by the user beforehand.

KW - SRML

KW - XML

KW - XML compaction

KW - XML semantics

UR - http://www.scopus.com/inward/record.url?scp=28844468880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28844468880&partnerID=8YFLogxK

U2 - 10.1016/j.infsof.2005.03.001

DO - 10.1016/j.infsof.2005.03.001

M3 - Article

AN - SCOPUS:28844468880

VL - 48

SP - 90

EP - 106

JO - Information and Software Technology

JF - Information and Software Technology

SN - 0950-5849

IS - 2

ER -