Incremental physical data clustering

Researcher:

Prof. Emeritus Oded Shmueli | Computer Science

Categories:

Computer Science & Electrical Engineering

The Technology

XML is becoming widely used as a primary encoding scheme for data and knowledge. The number of applications based on XML grows steadily. XML is also used extensively in encoding of databases (DB). The DB is practically stored as one XML file, which may reach immense magnitude. Providing that at any given time most of the DB is on external memory (Hard Disk), accessing specific records is a very time-consuming process.

In this method, XML database data clustering is treated as an augmented (with sibling edges) tree partitioning problem. We propose the PIXSAR (Practical Incremental XML Sibling Augmented Re-clustering) algorithm for incrementally clustering XML documents, turning the database into a workload-driven dynamically rearranging storage engine. PIXSAR reduces access time substantially by reorganizing the XML data according to changes in popularity of certain nodes (records). Experimentation with a real disk shows that while handling a set of 400,000 queries, PIXSAR saves more than 60% of query time, compared to a current method.

In addition, a supplement method (iPIXSAR) was devised for handling similar issues for multi-index DBs. For this type of DBs, iPIXSAR is superior to PIXSAR.

Advantages

Improved query performance – reduces the number of page faults while querying
Customizable method – for use in specific fields and for specific targets
Multiple index re-ordering capability

Applications and Opportunities

This method can be applicable to efficient adaptive storage of arbitrary XML files, including those in databases. The method may be applied both at the DBMS level and at the disk management level.

Business Development Contacts

Dr. Arkadiy Morgenshtein

Director of Business Development, ICT

Incremental physical data clustering

Categories:

The Technology

Advantages

Applications and Opportunities

BECOME A MEMBER