Two new approaches for compressing XML
LE3 .A278 2005
2005
Muldner, Tomasz Diamond, Jim
Acadia University
Master of Science
Masters
Computer Science
The Extensible Markup Language (XML) has emerged as a popular format for associating semantics with data. While XML offers the advantages of flexibility and extensibility, its expressiveness leads to greater verbosity. It is not uncommon for an XML representation of a data set to be five to ten times larger than alternative data encoding formats. The goal of XML compression is to reduce this verbosity without sacrificing the benefits of using an XML representation. This thesis provides an overview of existing XML compressors before introducing two new approaches to XML-conscious compression. The first strategy, called AXECHOP, uses a grammar-based strategy for compressing XML structure. It is intended for use in archiving applications, where reducing disk storage requirements is a top priority. The second presented strategy, TREECHOP, performs an online compression of XML data and supports querying of compressed data without requiring a prior decompression to take place. These features make it particularly well-suited for use in XML messaging applications, where key concerns are the efficient transfer and processing of XML-encoded messages between networked systems.
The author retains copyright in this thesis. Any substantial copying or any other actions that exceed fair dealing or other exceptions in the Copyright Act require the permission of the author.
https://scholar.acadiau.ca/islandora/object/theses:110