On mining XML structures based on statistics

Hiroshi Ishikawa, Shohei Yokoyama, Manabu Ohta, Kaoru Katayama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages379-390
Number of pages12
Volume3681 LNAI
Publication statusPublished - 2005
Externally publishedYes
Event9th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2005 - Melbourne, Australia
Duration: Sep 14 2005Sep 16 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3681 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other9th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2005
CountryAustralia
CityMelbourne
Period9/14/059/16/05

Fingerprint

XML
Schema
Mining
Statistics
Tables
Costs and Cost Analysis
Query processing
Databases
Costs
Query Processing
Query
Children

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Ishikawa, H., Yokoyama, S., Ohta, M., & Katayama, K. (2005). On mining XML structures based on statistics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3681 LNAI, pp. 379-390). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3681 LNAI).

On mining XML structures based on statistics. / Ishikawa, Hiroshi; Yokoyama, Shohei; Ohta, Manabu; Katayama, Kaoru.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3681 LNAI 2005. p. 379-390 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3681 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ishikawa, H, Yokoyama, S, Ohta, M & Katayama, K 2005, On mining XML structures based on statistics. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3681 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3681 LNAI, pp. 379-390, 9th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES 2005, Melbourne, Australia, 9/14/05.
Ishikawa H, Yokoyama S, Ohta M, Katayama K. On mining XML structures based on statistics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3681 LNAI. 2005. p. 379-390. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Ishikawa, Hiroshi ; Yokoyama, Shohei ; Ohta, Manabu ; Katayama, Kaoru. / On mining XML structures based on statistics. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3681 LNAI 2005. pp. 379-390 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{f169ac2b0c31460fae18f0eaba3110eb,
title = "On mining XML structures based on statistics",
abstract = "We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.",
author = "Hiroshi Ishikawa and Shohei Yokoyama and Manabu Ohta and Kaoru Katayama",
year = "2005",
language = "English",
isbn = "3540288945",
volume = "3681 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "379--390",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - On mining XML structures based on statistics

AU - Ishikawa, Hiroshi

AU - Yokoyama, Shohei

AU - Ohta, Manabu

AU - Katayama, Kaoru

PY - 2005

Y1 - 2005

N2 - We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.

AB - We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.

UR - http://www.scopus.com/inward/record.url?scp=33745324045&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745324045&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33745324045

SN - 3540288945

SN - 9783540288947

VL - 3681 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 379

EP - 390

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -