Incremental physical data clustering

Researcher:
Prof. Oded Shmueli | Computer Science

Categories:

Information and Computer Science

The Technology

XML is becoming widely used as a primary encoding scheme for data and knowledge. The number of applications based on XML grows steadily. XML is also used extensively in encoding of databases (DB). The DB is practically stored as one XML file, which may reach immense magnitude. Providing that at any given time most of the DB is on external memory (Hard Disk), accessing specific records is a very time-consuming process.

In this method, XML database data clustering is treated as an augmented (with sibling edges) tree partitioning problem. We propose the PIXSAR (Practical Incremental XML Sibling Augmented Re-clustering) algorithm for incrementally clustering XML documents, turning the database into a workload-driven dynamically rearranging storage engine. PIXSAR reduces access time substantially by reorganizing the XML data according to changes in popularity of certain nodes (records). Experimentation with a real disk shows that while handling a set of 400,000 queries, PIXSAR saves more than 60% of query time, compared to a current method.

In addition, a supplement method (iPIXSAR) was devised for handling similar issues for multi-index DBs. For this type of DBs, iPIXSAR is superior to PIXSAR.

Advantages

  • Improved query performance – reduces the number of page faults while querying
  • Customizable method – for use in specific fields and for specific targets
  • Multiple index re-ordering capability

Applications and Opportunities

  • This method can be applicable to efficient adaptive storage of arbitrary XML files, including those in databases. The method may be applied both at the DBMS level and at the disk management level.
arrow Business Development Contacts
Shikma Litmanovitz
Director of Business Development, Physical Science