当前位置:首页 >> >> Electronic Notes in Theoretical Computer Science 78

Electronic Notes in Theoretical Computer Science 78


???

? ???? ??? × ? ?

??? ?????? ?× ? ??????? ? ? ?? ×??????

?? ? ?

????? ? ? ?

??????

? ???

?

?

×

Three approaches to partiality in the sketch data model
Michael Johnson
Departments of Mathematics and Computing Macquarie University Sydney, Australia

Robert Rosebrugh
Department of Mathematics and Computer Science Mount Allison University Sackville, NB, Canada

Abstract Partial information is common in real-world databases. Yet the theoretical foundations of data models are not designed to support references to missing data (often termed nulls). Instead, we usually analyse a clean data model based on assumptions about complete information, and later retro?t support for nulls. The sketch data model is a recently developed approach to database speci?cation based on category theory. The sketch data model is general enough to support references to missing information within itself (rather than by retro?tting). In this paper we explore three approaches to incorporating partial information in the sketch data model. The approaches, while fundamentally different, are closely related, and we show that under certain fairly strong hypotheses they are Morita equivalent (that is they have the same categories of models, up to equivalence). Despite this equivalence, the query languages arising from the three approaches are subtly different, and we explore some of these differences. Key words: Database, semantic data model, null, category theory.

1 Introduction
Partial information is common in real-world databases. To offer a banal example, most address books contain many entries (records) with an empty address ?eld. In database terms, we sometimes say that the address ?eld is null.
Research partially supported by the Australian Research Council and NSERC Canada.

c ?2002 Published by Elsevier Science B. V.

?? ?×?? ? ??× ??

The theoretical foundations of databases are most frequently based upon the mathematical study of relations. Relations, formally subsets of cartesian products, cannot have ?elds which are null, and attempts to extend the notion of relation to permit nulls have led to many anomalies (for a textbook treatment of these problems see [18], chapter 18). The problem is essentially that the current foundations depend upon a simple data model based on assumptions about the completeness of information, and later attempts have been made to retro?t support for nulls. The sketch data model is a recently developed approach to database speci?cation based on category theory. The sketch data model is general enough to support references to missing information within itself rather than by retro?tting. In this paper we explore three approaches to incorporating partial information in database speci?cations structured as sketch data models. For the purposes of this analysis in all three cases we begin with a speci?cation which would correspond to the database with incomplete information prohibited, along with a speci?cation of which attribute values might be allowed to be null. The approaches differ in how they encode the information about permitted nulls. Our main results show that under certain fairly strong hypotheses the three approaches are Morita equivalent (that is they have the same categories of models, up to equivalence). The three approaches are well understood by category theorists and have been applied in a range of areas of theoretical computer science. Their application to data modelling is, to the authors’ knowledge, entirely new, and the delicacies of their interaction with the database speci?cation are somewhat surprising. In the sketch data model the function which assigns to a given entity instance a certain attribute value is speci?ed by an arrow in a directed graph, which we will call here an attribute arrow. The approach that we will take is to include with a speci?cation a subset R of its attribute arrows. The arrows in R will be the ones for which null values are permitted (all other attribute functions are required to be fully de?ned). To simplify these initial explorations of the three approaches we will put some limitations on allowable elements of R . These are the “fairly strong hypotheses” referred to above. The limitations ensure that permitting partiality does not interact too much with other aspects of the speci?cation. Relaxing these limitations will be the subject of future work. It is worth noting that there are several inequivalent approaches to relaxing the limitations, so including them here would necessitate comparing approximately eight approaches to partiality — far too many to treat with any rigour in one paper. The eight approaches divide into three closely related groups, so we have chosen to explore here the three groups under hypotheses that eliminate the other differences. We conclude this introduction with a brief look at related research. There has been considerable work on the incorporation of nulls into implementations of standard data models. For example outer joins and their optimization for query processing are considered in [22] and [23]. More theoretical treatments involving extensions to the relational model include [19] which introduces the “probabilistic relational model” and [33] which extends the relational model to incorporate “maybe information”. Probably the most theoretical treatment of partial 2

?? ?×?? ? ??× ??

information and its incorporation into classical data models is given by Date. He is also the most proli?c author in this ?eld [17], [5], [13], [14], [15] and [16], and he is generally scathing about the theoretical foundations so far provided for partial information. Apart from the authors’ own work there has been considerable use of sketches to support data modelling. Piessens and Steegmans developed a notion of data speci?cation including sketches. They have obtained results on the algorithmic determination of equivalences of model categories [34] and [35] which were intended to support plans for view integration. Diskin and Cadish have used sketches for a variety of modelling purposes including for example [20] and [21]. They have concentrated on developing the diagrammatic language of “diagram operations”. Several others, including Lippe and ter Hofstede [32], Islam and Phoa [25], Tuijn and Gyssens [38], Rosebrugh and Wood [36] and Baclawski et al. [2], have been applying category theory to data modelling. None of this work has so far considered modelling partial information and its interaction with sketches. The remainder of the paper is organised as follows. In Section 2 we brie?y review the de?nitions required by the sketch data model and refer readers if required to other papers where a more gentle introduction of the model along with appropriate motivation is provided. In Section 3 we indicate the general nature of the three approaches and make precise the hypotheses we put upon R . In Sections 4, 5 and 6 we develop the three approaches in detail, and prove the main results, the two Morita equivalence theorems. Finally Section 7 considers the resulting query languages and Section 8 concludes.

2 Background
A review of the basic de?nitions follows. General introductions to the sketch data model can be found in for example [27], [28] and [30]. The basic idea is to use the sketches of categorical universal algebra as a database speci?cation tool. The sketches are considerably more powerful than standard entity relationship (ER) models [4] [24] supporting as they do constraint information [26] that would normally be recorded outside of the standard ER framework. Nevertheless, they are suf?ciently like ER models that practitioners can work with them, and they have already been valuable in large scale consultancies [6], [7], [8], [10], [37] and [12]. The extra constraint information has proved valuable theoretically too, leading to a new treatment of the view update problem [11] and to new techniques for database interoperability [29] and [9]. For background material on the theory of sketches the reader can consult [3] or [1].
/ G pb De?nition 2.1 A cone C ?vC IC : BC b??BC ?0 ? in a graph G consists of / G (the base diagram a node vC of G (the vertex of C), a graph morphism IC : BC / IC b. Cocones are dual (that of C), and, for each node b in BC , an edge pb : vC is we reverse all the edges of G which occur in the de?nition, so the new de?nition

3

?? ?×?? ? ??× ??
/ vC ). The edges p is the same except that the last phrase requires edges jb : IC b b in a cone (respectively cocone) are called projections (respectively coprojections or injections).
?G D L C ? consists of a directed graph G, a set D of De?nition 2.2 A sketch pairs of directed paths in G with common source and target (called the commutative diagrams) and sets of cones L and cocones C in G. The category generated by the graph G modulo the commutative diagrams in D is denoted C? ?.
? ?G D L C ? and ?G? D? L ? C ? ? be sketches. A sketch De?nition 2.3 Let ? / G? which carries, by com/ morphism H : is a graph morphism H : G position, commutative diagrams in D, cones in L and cocones in C to respectively commutative diagrams in D? , cones in L ? and cocones in C ? .

De?nition 2.4 A model M of a sketch in a category S is a graph morphism from G to the underlying graph of the category S such that the images of pairs of paths in D have equal composites in S and cones (respectively cocones) in L (respectively in C ) have images which are limit cones (respectively colimit cocones) in S. That is, a model is precisely a sketch morphism from to the underlying sketch of the category S. Equivalently, we can also express models in terms of functors as /S follows. A model of in S is exactly the same thing as a functor M : C? ? such that the cones and cocones in L and C are sent to limit cones and colimit cocones in S. Thus, the models are some of the objects of the category C? ? S? of functors from C? ? to S. It is important to note that the category S need not satisfy any particular exactness conditions, though lack of exactness will clearly reduce the number of potential models. De?nition 2.5 If M and M ? are models of (viewed as functors) a homomorphism / M ? is a natural transformation from M to M ? . φ:M Models and their homomorphisms form the category of models of in S deS? noted by Mod? S?, namely the full subcategory of the functor category determined by those functors which are indeed models. Frequently we will write simply Mod? ? when their is unlikely to be confusion about the identity of S. The following de?nes the class of sketches which we use for the sketch data model. We generally work in model categories Mod? S? where S is a lextensive category, that is S has ?nite limits and disjoint universal ?nite sums. De?nition 2.6 An EA sketch sketch such that
? ? ?

?G

D L C ? is a (?nite limit, ?nite coproduct)

There is a speci?ed cone with empty base in L . Its vertex will be called 1. Arrows with domain 1 are called elements. Nodes which are vertices of cocones all of whose injections are elements are called attributes. The graph of G is ?nite. 4

?? ?×?? ? ??× ??

Nodes which are neither attributes, nor 1, are called entities. We say that the EA sketch is keyed if for each entity E there is a speci?ed / AE attribute AE called its key attribute and a chosen monic speci?cation kE : E / from the entity to the speci?ed attribute. This is essentially the requirement of entity integrity, for it means that there is a chosen primary key. Notice that such primary keys cannot be composite since they are monic speci?cations to a single attribute. Nevertheless, there is nothing in the model which forbids other possibly composite candidate keys — they would be speci?ed by a monic speci?cation to a product of attributes.

3 Approaches to incomplete information
We propose three ways to consider incomplete information, or null values in attributes, in the sketch data model. Note that the semantics of the sketch data model requires that the value of an entity in any model can never be null. Moreover, the value of any arrow between entities in any model can also never be only partially de?ned. Suppose that we are given an EA sketch and we want to add a treatment for nulls in Mod? S?. The question arising is how to change the sketch , the modelling category S, or possibly the notion of model in order to allow for unknown values at speci?ed attributes of speci?ed entities. In the ?rst two approaches there is no change to the entities in the sketch. In the ?rst technique, only attribute cocones are modi?ed, essentially by adding a null value to the attribute. Thus we change the EA sketch to explicitly include null values wherever they are to be permitted. In the second technique, we change the modelling category, normally taken to be S set0 , the category of ?nite sets. The idea here is to keep the EA sketch the same, but to allow it to take values in a category of “lifted sets” — sets which already incorporate a special value which will stand for null. The change of the model category in this case necessitates a change of the notion of model because lifted sets do not form a lextensive category. So we introduce a new notion called here R p-model. This approach may be extended to allow more general ordered sets (as used in domain theory) to model complex partial information, but the treatment of R p-models is delicate. The third approach involves considerable modi?cation of the EA sketch, including the introduction of new entities, though the modelling category S and the de?nition of model do not change. In the third approach each attribute arrow E ? A which is allowed to be null is replaced by a new entity E ? and a span of arrows E ? E ? ? A. In addition we require E ? to be complemented, so we add another entity E ?? and a coproduct speci?cation that ensures that in a model M, ME ME ? · ME ?? with the injection ME ? ? ME given by the image of the added arrow E ? ? E. Incidentally, this implies that ME ? ? ME is mono, and the idea here is that ME ? is the subobject of ME for which the attribute is fully de?ned 5

?? ?×?? ? ??× ??

(non-null). In all three cases we will suppose that the attribute arrows for which null values are permitted are given in a set R . Throughout this paper we will assume that ?G D L C ? with domain an entity and arrows in R are arrows from a sketch codomain an attribute such that
? ? ? ?

no arrow in R occurs in a diagram in D no arrow in R occurs in a cone in L no arrow in R occurs in a cocone in C the codomain (attribute) of each arrow in R is not the domain of any arrow of G.

Such a set R is called -independent. Of course, if is keyed, we will expect key attributes to never take null values, so key attribute arrows cannot occur in R .

4 Attributes with null
Suppose given a sketch and an -independent set R . The apparently simplest approach is to suppose that each attribute A which occurs in the codomain of an arrow of R has added to it a speci?ed element called null. However, since the same attribute A might occur as the codomain of an arrow in R and another arrow not in R we will modify the EA sketch by adding for each f : E ? A ? R a new attribute A f whose elements are those of A and an extra speci?ed element called null f , and the arrow f : E ? A will be replaced by an arrow, called f · : E ? A f . We denote the result of this change of by · . R More precisely, De?nition 4.1 Let be an EA sketch and R an -independent set. De?ne · R ?G· D· L · C · ? as follows:
?

the nodes of G· are the nodes of G together with, for each arrow E a new node A f
f· iA f

f

/ A in R ,
f

?

the edges of G· are the edges of G not in R together with, for each E

/A

in

R , a new edge E
null f / Af
? ? ?

/ Af ,

and for each new node A f , edges A

/ Af

and 1



D

L

·

L
f

C · is the union of C and for each arrow E
A
if

/A

in R , a new cocone

/ Af o

null f

1

Notice that we have made no change to the sketch data model methodology as described elsewhere. Indeed the sketch · is just an EA sketch. What we are doing R 6

?? ?×?? ? ??× ??

here amounts to the “special values” idea described by Date [18], Chapter 18. Notice an advantage of this ?rst approach: We could specify more than one type of incomplete information simply by analogously adding more than one null value to an attribute. For example, the Phone attribute of a Person entity may be unknown because that information is not yet available, or because the person refuses to provide the information. Such a distinction could be encoded with this ?rst approach by adding elements that indicate the type of incomplete information.

5 The lift monad and sketch data models
A commonly used construction for dealing with partially de?ned functions is the lift monad. The natural domain of this monad is the (2-)category of ordered objects and order-preserving arrows. When the base category is set, to an ordered set X , the lift monad construction assigns the ordered set X whose elements are those of X together with a new bottom element satisfying x for every element x of X . The (order-preserving) inclusion of X in X and the collapse of two bottom elements provide a monad structure on ??? : ord ? ord whose algebras ord are ordered sets with a bottom element. Morphisms preserve the bottom element. is often “undeIn applications in computer science, the interpretation of ?ned”. With this in mind we call an arrow in ord fully de?ned if the inverse image of is , that is, no non-bottom element goes to the bottom. When the base category is a topos other than set, it is appropriate to consider the “partial map classi?er” and the resulting “constructive lift monad” studied by Kock [31]. When restricted to the discrete order on a set X , X may be thought of as a “?at” order with a bottom element adjoined. We denote the category of such orders by set and call its objects “lifted sets”. Note that set is the full subcategory determined by objects in the image of the functor set ? ord ? ord

where the value of the functor from set to ord at a set X is X with the discrete order. We denote this image by L : set ? set . An arrow in set is fully de?ned if and / set the functor only if it is L f for some arrow f ? set. We denote by V : set whose value at a lifted set X is the set of elements of X (including ). The set of non-bottom elements of a lifted set X is denoted φX . For a fully de?ned arrow f : X ? Y of set we denote by φ f the restriction of V f to φX so φ f : φX ? φY . Thus φ is functorial and colimit preserving on the image of L in set0 and inverse to L there. Write set0 for the full subcategory of set determined by set0 . Our second approach to nulls envisages taking certain “models” in S set0 . Some care is required here as set0 is not a lextensive category. Indeed, there is a zero object / 0 L0, so any sum of terminal objects in set0 is 0. However set0 does have sums. They are obtained by summing the “non-bottom” elements and identifying bottoms, so any ?nite lifted set is isomorphic to a sum of the object L1. 7

?? ?×?? ? ??× ??

As above, we want nulls to have no effect on entities in models. Let C ?vC IC : BC ? G? be a cone in an EA sketch and let M be a functor C? ? ? set0 . Then MIC is a diagram BC ? set0 . If all the arrows MIC f : MIC B ? MIC B? are fully de?ned then we denote by ξ?C M ? the limit of the diagram φMIC in set0 (including the projections). Now we make the following de?nition.
?G D L C ? be an EA sketch and R an -independent set. De?nition 5.1 Let An R p-model M of is a graph morphism from G to the underlying graph of the category set0 such that

R p-1 R p-2 R p-3 R p-4 R p-5

the images of pairs of paths in D have equal composites in set0 ; the image of any non-R arrow is fully de?ned; the image M1 of the vertex 1 of the empty cone is L1; each cone C in L has image Lξ?C M ?; ?nite sum cocones in C have images which are ?nite sum cocones in set0 .

The idea here is that, as noted in Section 3, the value of an entity to attribute arrow in R may be null (or unde?ned), but other arrows and exactness requirements among them should be fully de?ned. By the ?rst item, R p-models are functors / set , and by the third and fourth items they are not models of in M : C? ? 0 set0 . For example, the lift of 1 is not the terminal object in set0 , and for sets A B the product of L?A? and L?B? in set0 is not generally L?A ? B?. De?nition 5.2 A homomorphism of R p-models is a natural transformation, all of whose components are fully de?ned. We denote the category of R p-models of an EA sketch by R p-Mod? ?. Notice the requirement that the components be fully de?ned. We show that in appropriate circumstances the ?rst two approaches yield the same models. Theorem 5.3 Let be an EA sketch and let R be an -independent set. There is an equivalence of categories Mod? · ? ? R p-Mod? R
?

/ R p-Mod? ?. Let M be Proof. We begin by de?ning a functor Ψ : Mod? · ? R in Mod? · ?. On nodes we set ?ΨM ??X ? L?M ?X ??. On edges f not in R we set R ΨM ? f ? LM ? f ?. On edges f : E ? A in R we set

ΨM ? f ??x?

LM ? f

·

??x?

if x if x ? M ?E ? and M ? f · ??x? if x ? M ?E ? and M ? f · ??x?

null f null f

We need to show that ΨM is functorial, that it is in R p-Mod? ? and that Ψ is functorial. To see that ΨM is functorial we need to show that it respects the commutative diagrams. Because L is functorial this is clear for commutative diagrams involving 8

?? ?×?? ? ??× ??

only non-R arrows. This follows by functoriality of L and M, since the assumptions on R ensure that only non-R arrows occur in diagrams of D. To see that ΨM is in R p-Mod? ?, note that R p-1 has just been shown and R p2 and R p-3 follow by the de?nition of ΨM. R p-5 follows by the de?nition of ΨM · and the fact that L preserves sums. Every cone C in L appears unchanged in R , so M carries them to limit cones in set0 . Furthermore, because each cone C contains no R arrows, the de?nition of ΨM ensures that ΨM carries it to Lξ?C M ?. Thus ΨM is in R p-Mod? ?. · Next suppose α : M ? N is a morphism of Mod? R ?. We de?ne the natural transformation Ψα by ΨαX LαX . To check for naturality, suppose f is an arrow of . If f is not in R then naturality is immediate since L is a functor. If on the other hand f : E ? A is an element of R then consider LME
?ΨM ? f

LαE

/ LNE
?ΨN ? f



LMA

LαA

 / LNA

and check for commutativity under each of the three cases that appear in the de?then the square commutes since ΨM ? f ?, nition of ΨM ? f ?. Let x ? LME. If x ΨN ? f ?, LαE and LαA all preserve . Similarly, in the third case M ? f · ??x? Mnull f and the naturality of α ensures that N ? f · ??αE ?x?? Nnull f and again is preserved. For the remaining (second) case of the de?nition ΨM ? f ??x? LM ? f · ??x?, and naturality follows since L is functorial. Finally, note that all components are fully de?ned and so Ψα is indeed a morphism in R p-Mod? ? and Ψ preserves composition of natural transformations, so it is a functor. Next we de?ne Φ : R p-Mod? ? ? Mod? · ?. Let N be in R p-Mod? ?. We R need to de?ne a model ΦN : · ? set0 . On nodes we set R
?ΦN ??X ?

φN ?X ? if X is a node in V NA if X A f for some f in R

On edges f we set φN f
?ΦN ?? f ?

if f is non R V NE V NA / V NA
VN f

φNE ? φNA ? 1

/ V NA

if f if f if f

g· : E ? Ag for g in R ig for g in R nullg for g in R

· We need to show that ΦN is functorial, that it is a model of R and that Φ is functorial. Commutative diagrams in D· are preserved by ΦN since only non R arrows are involved, N is functorial and φ takes a fully de?ned commutative diagram in

9

?? ?×?? ? ??× ??

set0 to a commutative diagram of functions in set0 . Cones are preserved because none of their arrows are from R , condition R p-4 ensures that N sends them to L of a limit cone in set0 and ΦN is de?ned by applying φ which returns them to the limit cones in set0 . Similarly, N sends cocones in to sum diagrams in set0 and by the construction of sums in set0 , φ carries them to sum diagrams in set0 . Furthermore, the de?nition of ΦN on their injections guarantees that the cocones A if / A f o null f 1 are preserved. Thus ΦN is a model of · . R Next suppose α : M ? N is a morphism of R p-Mod? ?. We de?ne the natural transformation Φα by ΦαX φαX if X is a node of V αA if X A f for f in R

Note that the ?rst case is well-de?ned by the fully-de?nedness condition on morphisms of R p-Mod? ?. To check for naturality, suppose f is an arrow of · . If R f is in then naturality is immediate since the naturality square is φ applied to a naturality square for α. If f g· for g : E ? A in R , consider φME _ V ME
V Mg φαE / φNE

_



 V αE / V NE
V Ng



V MA

V αA

/ V NA



Both squares in the diagram clearly commute and the vertical composites de?ne ΦMg and ΦNg. Finally, since αA is fully de?ned and being an arrow in set0 preserves , the naturality squares for ig and nullg commute for g in R . Clearly Φ preserves composition of natural transformations, so it is a functor. Finally, we show that ΨΦ?N ? N and ΦΨ?M ? M. On nodes both are straightforward from the de?nitions of Ψ and Φ. Indeed, for nodes of Ψ adds a bottom element to the value while Φ strips this away. Similarly on R attributes, null elements are exchanged for bottom elements. Extending this to arrows is straightforward. ? It might seem that De?nition 5.1 is rather ad hoc, so we provide an alternative de?nition as follows. The new de?nition is motivated by the fact that the fully-de?nedness of the components of morphisms of R p-models suggests that they might be viewed as set0 valued models of some subsketch. We denote by -R the EA sketch resulting when the arrows in R are deleted from , and by J the inclusion sketch morphism -R J / . De?nition 5.4 An R -partial model of C? -R ?
J

is a functor M : C?
/ C? ?
M

?

/ set

0

such that

/ set

0

10

?? ?×?? ? ??× ??

factors through L as a model of -R in set0 . Proposition 5.5 If R is -independent then a functor M : C? model if and only if it is an R -partial model.
?

/ set0

is an R p-

Proof. Suppose M is an R -partial model. Since MJ factors through L as C? -R ?
M

/ set0

L

/ set

0

say, and L preserves ?nite sums and ?nite connected limits we see that for M:

R p-1 holds since M is a functor and there are no R arrows in D R p-2 holds by the factorization using M. Indeed M of any non-R arrow is fully de?ned, being L of a set0 arrow

R p-3 holds since M1

MJ1 LM1 L1, the last equality because M is a model and the ?rst since J is a morphism of sketches

R p-4 For any cone in L , R p-4 holds trivially using that M is a model. R p-5 holds since L preserves sums (being a left adjoint). In the other direction, let M be an R p-model and γ be a non-R arrow in G. Then by R p-2, Mγ MJγ L f where f is some set0 arrow, and de?ne Mγ f .
For nodes X of G de?ne MX to be the unique K in set0 such that LK MX. Thus MJX MX LMX. Notice that L is bijective on objects and faithful since set0 is de?ned to be the free ??? · 1 algebras. Thus MJ LM and it remains to show that M is a model. That M preserves commutative diagrams and sums follows since L is faithful and preserves sums. Preservation of limit cones follows by R p-3 and ? R p-4. In conclusion we remark at this point that while adding bottom values at ?rst seems an attractive addition, the modi?cation to the notion of model that is introduced in order to provide a reasonable semantics is undesirable. Moreover, there is, for the basic case of merely lifted sets, no change in the expressive capacity of the models from that obtained by simply adding null elements to attributes. Recall that the query language for a sketch data model is the classifying category, usually denoted Q? ?. Notice that the query language Q? ? obtained from is all that we naturally have available in the case of lifted set models, but Q? ? need not have the universal property with respect to R p-models that it has as a classifying category with respect to models, and consequently there may be no canonical evaluation of a query in Q? ? for an R p-model M.

6 Implementing partial arrows in the sketch data model
Our third approach requires more serious modi?cation of the basic EA sketch. Like the ?rst approach it does not require variation of the notion of model. It shares with the lifted set approach the idea that nulls are missing information rather than special values. 11

?? ?×?? ? ??× ??

The idea here is that entity to attribute arrows with missing information should be implemented as partial arrows. Recall that a partial arrow from X to Y is a / X is a part of X (a monic arrow with codomain X ), and pair i f where i : X0 / Y is arbitrary. f : X0 An appropriate construction for a sketch that implements this idea follows: De?nition 6.1 Let be an EA sketch and R a set of -independent arrows. De?ne ?G D L C ? as follows: R
?

the nodes of G are the nodes of G and for each E and E f
mf if jf

f

/

A in R , new nodes E f
f

?

the edges of G are the edges of G not in R , and for each E edges E f
/ A,

/

A in R , new

Ef

/E

and E f

/E

? ? ?

D

D

L

L
f if

C is the union of C and for each E
Ef

/A /E o

in R , a new cocone
jf

Ef

Theorem 6.2 Let be an EA sketch. Let R be an -independent set. There is an equivalence of categories: Mod? · ? ? Mod? R ? R
· / Mod? Proof. We begin by de?ning a functor Ψ : Mod? R ? R ?. Let M be in · Mod? R ?. On nodes we de?ne ?ΨM ?X MX if X is a node of , and if X E f or X E f we de?ne ?ΨM ??X ? so that both squares in the following diagram are pullbacks, and thus the rows are sum diagrams:

ΨM ?E f ?
ΨM ?m f ?

ΨM ?i f ?

/ M ?E ? o
M? f · ?

ΨM ? j f ?

ΨM ?E f ?



MA

M ?iA f ?

 / M ?A f ? o


M ?nullA ?

M1

On edges f of G that are not in R we de?ne ΨM ? f ? to be M ? f ?. On the edges
/ E and E f / E the effect of ΨM is de?ned by the diagram / A, E f Ef above. We need to show that ΨM is functorial, that it is an R -model and that Ψ is functorial. To see that ΨM is functorial we need to show that it respects the commutative diagrams of R . Because M is functorial this is clear for commutative diagrams
mf if jf

12

?? ?×?? ? ??× ??

involving only non-R arrows, but the assumptions on R and the construction of R ensure that only such diagrams occur in R . Trivially ΨM send cones in L to limit cones (after all the cones are the same in · all three of R and R ). Similarly for cocones in C . The extra cocones in C are sent to sum diagrams in set0 as noted above. Thus ΨM is a model. · Next suppose α : M ? N is a morphism of Mod? R ?. We de?ne the natural transformation Ψα on nodes of by ΨαX αX . For the nodes E f , ΨαE f is de?ned as shown in the diagram, noting that the front face is a pullback and that the commutativity of the back, right and bottom faces ensures that the outside of the diagram commutes.
?ΨM ?E f
?ΨM ?i f

MM M MΨα?E f ? MM M
?ΨM ?m f

?ΨN ?E f

M&

/ ME HH HH HH HH E α HH HH HH H$ ?ΨN ?i f / NE
M f·



MA MM

M ?iA f ?

 / MA f

N f·

MMM ?ΨN ?m f MMM MMM αA MMM MMM & 

NA

NiA f

II II IIαA f II II II I$  / NA f

For the nodes E f a similar argument de?nes ΨαE f . These de?nitions make Ψα a natural transformation since, for arrows in , α is already natural, while for the arrows m f and i f the naturality squares are the left and top faces of the diagram above. Similarly for j f . Furthermore, Ψ preserves composition of natural transformations, so it is a functor. / Mod? · ?. Let N be in Mod? Next we de?ne Φ : Mod? R ? R ?. On nodes R we set N ?X ? if X is a node of ?ΦN ??X ? N ?A? · 1 if X A f is a new attribute On edges f of G that are not in R we de?ne ΦN ? f ? to be N ? f ?. Let the sum in ΦN ?iA f ? / ΦN ?A f ? o ΦN ?nullA ? ΦN ?1?. If the edge f the second case be ΦN ?A? is in R , we use the fact that N ?E f ?
N ?m f ? N ?i f ?

/ N ?E ? o
ΦN ?iA f ?

N? j f ?

N ?E f ? is a sum so that
/1

/ N ?A? ΦN ?A? the arrows N ?E f ? ΦN ?nullA ? / ΦN ?1? ΦN ?A f ? de?ne

/ ΦN ?A f ? and N ?E f ?

ΦN ? f · ? : ΦN ?E ?

N ?E f ? · N ?E f ?

/ ΦN ?A f ?

· We need to show that ΦN is functorial, that it is a model of R and that Φ is 13

?? ?×?? ? ??× ??

functorial. Commutative diagrams in D· are preserved by ΦN since only non R arrows are involved and N is functorial. Cones in L · are sent to limit cones in set0 because none of their arrows are from R , and N is a model. Similarly, N sends cocones in
/ Af to sum diagrams in set0 . The only remaining cocones are of the form A null f o 1 and the de?nition of ΦN ?A f ? shows that this is sent to a sum diagram as above. Thus ΦN is a model of · . R Next suppose α : M ? N is a morphism of Mod? R ?. We de?ne the natural transformation Φα on nodes of by ΦαX αX . For the nodes A f , ΦαA f is de?ned as shown on the bottom faces of the diagram below, that is ?Φα?A f αA · α1 .
iA f

ME f J
Mm f

Mi f

JJ αE JJ f JJ JJ J$

NE f

M / ΦME o LL LLL ? LΦα?E LLL LLL & Ni f / ΦNE o
?ΦM ? f ·

ME f j fL LLL LL αE LLLL f % N jf

NE f



MA KK

?ΦM ??iA ? f

KKK Nm f KK αA KKK K% 

 / ΦMA f o M

ΦM ?null f ?



NA

ΦNiA f

M ?Φα?A f ?ΦN ? f · M M M&  / ΦNA f o

M1 MM

MMM α1 MMM MMM M& 

ΦN ?null f ?

N1

These de?nitions make Φα a natural transformation since, for arrows in , α is already natural, while for the arrows iA f and null f the bottom squares of the diagram above show naturality, remembering that ?Φα?A αA and ?Φα?1 α1 . The only remaining arrows are of the form f · and the middle vertical square of the diagram is the required naturality square. It commutes because all outside faces of the diagram commute, so the two composites from ΦME to ΦNA f are both the unique arrow determined by the coproduct in the top line of the diagram. Furthermore, Φ preserves composition of natural transformations, so it is a functor. Finally, we need that ΨΦ?N ? N and ΦΨ?M ? M. On nodes both are straightforward from the de?nitions of Ψ and Φ. Indeed, for nodes and arrows of neither Φ nor Ψ makes any change. Ψ adds new entities that model null values in models with partial arrows while Φ models the partial arrows with null values. Similarly on R attributes, null elements are exchanged for partial arrow speci?cations. ?

7 Effects on queries
As the previous sections have shown, three constructions that extend an EA sketch or its models in order to introduce incomplete information result in equivalent model categories. For example, while the sketches · and R are different, they R are Morita equivalent. In the second approach, the case of the category of R pmodels, a different notion of model is used to obtain a category equivalent to the 14

?? ?×?? ? ??× ??

model categories for · and R . However, the classifying categories for · and R R are clearly different from each other and from the classifying category for . R As noted above, this last may fail to satisfy the classifying category property for R p-models. Hence the query languages that arise depend on the construction used and even to speak formally of the query language for the R p-models may require a new notion which we call the lift-model classifying category. The detailed analysis of differences between the classifying categories will be deferred to a future paper. In the meantime we record here by way of illustration a few examples. We begin by comparing the classifying category Q? ? for with the classifying · category Q? · ? for R . R
· We consider ?rst selection queries. The added null elements for R attributes mean that we can add queries which refer to null values. For example, we can ask for the Persons whose phone attribute is null. All of the nulls we have introduced are essentially typed by their attribute, so one classical dif?culty with null values, that they are equal independent of type does not arise. As we mentioned earlier, augmenting attributes by additional new values corresponding to different types of unknown information is handled smoothly by · . R The case of join queries is not so favourable. For example, Office entities might have a phone attribute and a join with the data in the previous paragraph should indicate pairs Person, Office where the Person’s phone number is that of the Office. Unfortunately in this case, there may be many spurious pairs in the join — all those Person, Office pairs where the Person and the Office have unknown phone attributes. It is, of course possible using ?nite sums (allowed by Q? · ?) to obtain the non-null phones with a query result called phone-nn for R example, then join both Person and Office with this result, and ?nally join those results with each other. Projection queries do not apparently pose any particular dif?culty in this case. The construction of · from is done without adding any new nodes to the R graph G of : only a new element per attribute is added. By contrast, the construction of R adds many new entities, arrows involving them and monic speci?cations. Thus we expect stronger effects on the expressivity of the query language. This is indeed the case. First for selection queries and referring to our examples above. Suppose that the Person and Office entities have their phone attributes expressed by arrows PP and OP respectively. The construction of R adds new entities called PersonPP and OfficeOP together with new arrows and monic speci?cations. When these entities are modelled they provide the Persons and Offices with known phone attributes.The join of the new entities over the phone attribute now computes the expected pairs, at least of PersonPP and OfficeOP entities, but this result is easily seen as Person, Office pairs via the new monic speci?cations. Fortunately the construction of R requires that entities like PersonPP be complemented. Had this not been required, as is usually the case in categories of partial

15

?? ?×?? ? ??× ??

morphisms, there would be no way of querying on nulls at all.

8 Conclusion
The most important conclusion to draw from this work is that, unlike the relational data model, there is, as shown by the ?rst and third approaches to partiality, no need to alter the foundations of the sketch data model in any way in order to support partiality. Notice that both · and R are standard sketch data models. R On the other hand, interestingly, the second approach does require substantial modi?cation of the foundations, needing as it does a new de?nition of model because set0 is not lextensive, and the appropriate notion of classifying category for that approach remains to be developed. It is likely to be some time before a detailed treatment of sketch data models valued in ordered sets (including information systems in the sense of Scott) can be worked out. This is contrary to the expectations of a number of workers who predicted it would be a routine extension of the theory. The most important results of the paper are the two Morita equivalence theorems. These theorems align precisely the three approaches, under the hypotheses on R , and show that they have equivalent expressive power. It is also worth noting two smaller points of interest: the requirement that morphisms of R p-models be fully-de?ned was unexpected, but can be justi?ed in retrospect, and the need to complement the subobject of domain of de?nition of partial functions was also unexpected. This latter could be avoided, but only by restricting the morphisms of models of the third approach by requiring them to be “cartesianly partial” by which we mean that the naturality squares involving the inclusions of the domains of de?nition must all be pullbacks.

References
[1] M. Barr and C. Wells. Category theory for computing science. Prentice-Hall, second edition, 1995. [2] K. Baclawski, D. Simovici and W. White. A categorical approach to database semantics. Mathematical Structures in Computer Science 4, 147–183, 1994. [3] F. Borceux. Handbook of Categorical Algebra 3. Cambridge University Press, 1994. [4] P. P. -S. Chen. The Entity-Relationship Model—Toward a Uni?ed View of Data. ACM TODS 2, 9–36, 1976. [5] E. F. Codd and C. J. Date Much ado about nothing. in Date Relational Database Writings 1991–1994, Addison-Wesley, 1995. [6] C. N. G. Dampney and Michael Johnson. TIME Compliant Corporate Data Model Validation. Consultants’ report to Telecom Australia, 1991. [7] C. N. G. Dampney and Michael Johnson. Fibrations and the DoH Data Model. Consultants’ report to NSW Department of Health, 1999.

16

?? ?×?? ? ??× ??
[8] C. N. G. Dampney and Michael Johnson. A formal method for enterprise interoperability: A case study in a major health informatics information system. Proceedings of the Thirteenth International Conference on Software and Systems Engineering, CNAM Paris, vol 3, 12-5, 1–6, 2000. [9] C. N. G. Dampney and Michael Johnson. Half-duplex interoperations for cooperating information systems. In Advances in Concurrent Engineering, IN-1, 7pp, International Institute of Concurrent Engineering, ISBN 09710461-0-7, 2001. [10] C. N. G. Dampney, Michael Johnson and G. M. McGrath. Audit and Enhancement of the Caltex Information Strategy Planning (CISP) Project. Consultants’ report to Caltex Oil Australia, 1994. [11] C. N. G. Dampney, Michael Johnson, and Robert Rosebrugh. View Updates in a Semantic Data Model Paradigm. Proceedings of the Twelfth Australasian Database Conference ADC2001, 29–36, IEEE Press, 2001. [12] C.N.G Dampney, Graham Pegler and Michael Johnson. Harmonising Health Information Models - a critical analysis of current practice. Proceedings of the Health Informatics Conference 2001, 8pp. ISBN: 0 9585370 8 9 [13] C. J. Date. NOT is not “Not”! in Date Relational Database Writings 1985–1989, Addison-Wesley, 1990. [14] C. J. Date. EXISTS is not “Exists”! (Some logical ?aws in SQL). in Date Relational Database Writings 1985–1989, Addison-Wesley, 1990. [15] C. J. Date. Watch out for outer join. in Date and Darwen Relational Database Writings 1989–1991, Addison-Wesley, 1992. [16] C. J. Date. Oh no not Nulls again. in Date and Darwen Relational Database Writings 1989–1991, Addison-Wesley, 1992. [17] C. J. Date. Faults and defaults (in ?ve parts). in Date, Darwen and McGoveran Relational Database Writings 1994–1997, Addison-Wesley, 1998. [18] C. J. Date. Introduction to Database Systems. Addison-Wesley, seventh edition, 2000. [19] D. Dey and S. Sarkar. A probabilistic relational model and algebra. ACM TODS 21, 1996. [20] Zinovy Diskin and Boris Cadish. Algebraic graph-based approach to management of multidatabase systems. In Proceedings of The Second International Workshop on Next Generation Information Technologies and Systems (NGITS ’95), 1995. [21] Zinovy Diskin and Boris Cadish. Variable set semantics for generalised sketches: Why ER is more object oriented than OO. In Data and Knowledge Engineering, 2000. [22] C. Galindo-Legaria and A. Rosenthal. Outerjoin simpli?cation and reordering for query optimization. ACM TODS, 22, 1997. [23] P. Goel and B. Iyer. SQL query optimization reordering for a general class of queries. In Proceedings of ADDM SIGMOD, 1996.

17

?? ?×?? ? ??× ??
[24] M. Gogolla and U. Hohenstein. Towards a semantic view of an extended entityrelationship model. ACM TODS, 16, 369–416, 1991. [25] A. Islam and W. Phoa. Categorical models of relational databases I: Fibrational formulation, schema integration. Proceedings of TACS94. Eds M. Hagiya and J. C. Mitchell. Lecture Notes in Computer Science 789, 618–641, 1994. [26] Michael Johnson and C. N. G. Dampney. On the value of commutative diagrams in information modelling. In Algebraic Methodology and Software Technology, Springer Workshops in Computing, 1994. [27] Michael Johnson and Robert Rosebrugh. View updatability based on the models of a formal speci?cation. Proceedings of FME 2001, 534–549, Lecture Notes in Computer Science 2021, 2001. [28] Michael Johnson and Robert Rosebrugh. Sketch Data Models, Relational Schema and Data Speci?cations. ENTCS, Volume 61, 6, 1–13, 2002. [29] Michael Johnson and Robert Rosebrugh. Database interoperability through state based logical data independence. International Journal of Computer Applications in Technology, 16, number 2-3, (2003) 97-102. [30] Michael Johnson, Robert Rosebrugh, and R. J. Wood. Entity-relationship models and sketches. Theory and Applications of Categories 10(2002), 94–112. [31] Anders Kock. Algebras for the partial map classi?er monad. Mathematics 1488(1991), 262–278.

Lecture Notes in

[32] E. Lippe and A ter Hofstede. A category theoretical approach to conceptual data modelling. RAIRO Theoretical Informatics and Applications, 30:31–79, 1996. [33] K. Liu and R. Sunderraman. Inde?nite and maybe information in relational databases. ACM TODS, 15, 1990. [34] F. Piessens and Eric Steegmans. Categorical data speci?cations. Applications of Categories 1, 156–173, 1995. Theory and

[35] F. Piessens and Eric Steegmans. Selective Attribute Elimination for Categorical Data Speci?cations. Proceedings of the 6th International AMAST 424-436, Lecture Notes in Computer Science 1349, 1997. [36] Robert Rosebrugh and R. J. Wood. Relational databases and indexed categories. In Proceedings of the International Category Theory Meeting 1991, CMS Conference Proceedings 13, 391–407, American Mathematical Society, 1992. [37] G. Southon, C. Sauer, and C. N. G. Dampney. Lessons from a failed information systems initiative: issues for complex organisations. International Journal of Medical Informatics, Elsevier Science, 1999. [38] C. Tuijn and M. Gyssens. CGOOD, a categorical graph-oriented object data model. it Theoretical Computer Science 160, 217-239, 1996.

18


赞助商链接
更多相关文档:
更多相关标签:
网站地图

文档资料共享网 nexoncn.com copyright ©right 2010-2020。
文档资料共享网内容来自网络,如有侵犯请联系客服。email:zhit325@126.com