MIME-Version: 1.0 Server: CERN/3.0 Date: Sunday, 24-Nov-96 22:43:15 GMT Content-Type: text/html Content-Length: 10366 Last-Modified: Monday, 16-Sep-96 20:58:14 GMT
Currently, PREDATOR does not have a serious query optimizer. Instead, it just uses the join order provided in the query. This project involves incorporating the OPT++ optimizer into PREDATOR. OPT++ is an independent library which can be used to customize and design a query optimizer. Work will involve finding out about OPT++, integrating it with PREDATOR query evaluation, and demonstrating query optimization on simple join queries.
The purpose of this project is to build a graphical tool that displays a query plan (the result of query optimization), and also displays the execution of that plan (possibly by displaying how the computation is proceeding).
This project involves getting a good understanding of the way indexes are used in query processing. Path indexes are complex indexes, which can be implemented on top of the existing simple index functionality in PREDATOR. They are very important in object-oriented and object-relational database systems. In this project, you will need to provide fully working path index capability (specifying an index in SQL, recording its presence in the catalogs, using the index if applicable in query optimization, and actually retrieving from the index at run-time). This project will give you a very good grasp of the internals of query processing engines.
The Wisconsin benchmark is an industry standard DBMS benchmark that is used to measure the performance of a relational DBMS. The project has two parts: first implement GroupBy and OrderBy features that are currently not supported in PREDATOR. Second, execute the benchmark, and try to enhance the performance to whatever extent is possible. This is invaluable experience if you plan to work on performance related issues in a real DBMS.
The TPC-D benchmark is an advanced query processing benchmark, and some of the functionality for this benchmark is not yet in PREDATOR. This project will involve a balance of adding some functionality (so that some of the benchmark queries run), and improving the performance of those queries. Again, like the previous project, this is good exposure to practical benchmarks that people care about.
PREDATOR already has a very elementary image data type. This project will implement a large part of the support for images found in RIVL( Brian Smith's multi-media system ), with operations to rotate, clip, overlap, etc an image.
PREDATOR already has a very elementary image data type. This project will involve interacting with Ramin Zabih to incorporate his feature extraction algorithms, and use these extracted image features to index the image data.
Add a video data type with support for the various operations defined in RIVL (Brian Smith's multi-media system)
This requires some knowledge of audio data, and the likely operations on audio. Audio data needs to be added as a data type, along with manipulation functions.
Add a document data type, along with NLP operations on the document (based on Claire Cardie's work). This will require interaction with the NLP people.
Pharmeceutical companies have huge databases of chemical molecular structures, and much of their research involves searching this database for 3-D spatial matches of molecules. This project will try to support a molecule structure as a data type in the database. Operations on the molecule will be based on research that Paul Schuh and others have done in this area. The project will involve interactions with that group.
Any commercial SQL system allows queries to be embedded inside a host language (like C, C++, COBOL, etc). This project will build a C++ embedding of PREDATOR SQL.
PREDATOR has an extensibility mechanism that allows new query processing engines to be incorporated into the system. This project will extend this mechanism to integrate external databases (for instance, an Informix server) into PREDATOR.
PREDATOR is a client-server system, implemented with a multi-threaded server. However, the multi-user nature of the system has not been tested, and there are several problems. This project will fix all the problems and demonstrate multi-user capability.
The current version of SQL is a small subset of the ANSI standard. This project will make sure that the ANSI standard SQL-92 is implemented to the extent of parsing and type checking. If 2 persons work on this, some query transformations will also be required in this project.
For quite a while, researchers have suggested that the results of queries can be cached for later use in executing another query. This project will provide some portion of this functionality. Since this is an ongoing topic of research, this project must go along with a paper survey of this topic.
Data mining is this exciting new area that blends AI with databases. The idea is that there is information or patterns hidden in a database that are not very evident. For instance, from medical databases, various statistical patterns can be extracted, or empirical cause-effect rules. This project must go with data mining paper survey. The purpose of the project is to implement some of the algorithms suggested in the literature, and see how they perform.
This is another aspect of data mining (see above). Here we are looking to classify a large amount of data into a few groups or clusters based on some properties. The point is to do this efficiently. Several algorithms have been proposed in the literature. The project will involve implementing a few of them.
Query optimization is (and has been) a very important topic in database query processing. In this project, you will build a stand-alone query optimizer using two or three different approaches. The purpose is to compare the alternatives suggested in the literature. This project must go along with a paper survey on query optimization.
OLAP is a very refined form of query processing that involves a large amount of precomputation of answers. The queries typically involve several aggregates, and the answers are presented graphically.
OLE-DB is a protocol that Microsoft has developed on top of OLE/COM to allow multiple databases to connect and interoperate. This project will involve using this protocol to build OLE-DB compliant database components, and will be built on NT using visual C++. Since you will not have a lot of help on Visual C++ from me or the TA, you should already be familiar with this environment if you plan to take on this project.