The "fragment and replicate" strategy has been used to process distributed queries. One of the relations referenced in a query is horizontally partitioned into fragments and distributed to a set of processing sites. After other relations are replicated at these sites, the query is processed in parallel. The query answer is the union of the results produced at each processing site. To process a query, we have to determine which relation to partition, how to partition this relation, and which sites will be the processing sites. In this paper, we extend this strategy by considering replication of portions of the relations instead of the entire relations to improve system performance. Based on the characteristics of semijoin, the min-max method and a hash-based method are designed to partition relations. A general algorithm based on these new processing methods is then given, which determines the relations to partition, the manner of partitioning, and the sites to use to process the query. Further, since the replicated relations may be useful for future queries, we will descuss the way to manage them. Finally, we will show how the algorithm can be used in a heterogeneous database system.
Journal of Inforamtion Science and Engineering,12(1),79-99