当前位置:首页 >> >> 2003), Amharic verb lexicon in the context of Machine Translation

2003), Amharic verb lexicon in the context of Machine Translation


TALN 2003, Batz-sur-Mer, 11-14 juin 2003

TALN 2003 Amharic verb lexicon in the context of Machine Translation
Sisay Fissaha , and Johann Haller Institute for Applied Information Sciences– University of Saarland Martin-Luther-Str.14, D-66111, Saarbrücken, Germany Tel +49-681-3895126, Fax +49-681-3895140 {sisay, hans}@iai.uni-sb.de http://www.iai.uni-sb.de

Abstract
Cet article traite de trois problèmes concomitants reliés à la morphologie des verbes amhariques dans le contexte de la Traduction Automatique. L’amharique, tout comme d’autres langues sémitiques, présente un caractère morphologique complexe empêchant une description unique de ses caractéristiques majeures. Une brève analyse des différentes théories proposées pour la classification des verbes amhariques montre qu’un panachage de ces approches correspond au mieux aux besoins de l’application. De plus, l’analyse des différents phénomènes de dérivation suggère qu’un lexème constitué uniquement de consonnes est bien approprié aux spécifications du transfert lexical. Pour finir, bien que la plupart des difficultés morphologiques de l’amharique puissent être traitées par les mécanismes d’état fini et de modèle à deux niveaux, la dérivation réduplicative (qui implique un changement dans la courbe d’intonation des voyelles et dans le schéma de gémination) crée des difficultés contraignant la définition de modèles supplémentaires. This paper discusses three related issues concerning the morphology of Amharic verbs in the context of Machine Translation. Amharic, like other Semitic languages, exhibits a complex morphological phenomenon defying a unified description of its important characteristics. A brief assessment of the different proposals made concerning the organization of Amharic verbs indicates that the amalgamation of the important characteristics of the different approaches better meet the requirements of the application at hand. Furthermore, analysis of the different derivational phenomena suggests that a lexeme consisting only of root consonants is well suited for specification of lexical transfer. Finally, although most of the complexities of Amharic morphology can be handled using the machinery of finite-state and two-level morphology, reduplicative derivation, which involves change in vowel melody and gemination pattern, poses difficulties forcing the stipulation of additional template forms.

Keywords
Amharic; lexicon; two-level morphology; machine translation; finite-state morphology; Semitic languages. Amharique; lexique; morphologie à deux niveaux, traduction automatique; morphologie à état fini; langues sémitiques

Sisay Fissaha and Johann Haller

1 Introduction
Amharic, which belongs to the Semitic family of languages, is one of the most widely spoken languages in Ethiopia. It has a complex morphology which makes complete listing of all surface word forms in the lexicon impossible. Thus the need for analysing Amharic words has long been recognized, which is reflected in some of recent attempts, such as Amharic wordprocessing (Daniel and Yonas, 1994), stemming algorithm (Nega, 1999), word parser (Abiyot, 2000), and parts-of-speech tagger (Mesfin, 2001). All these works may generally be characterized as providing shallow analysis which does not satisfy the requirements posed by machine translation systems. The current study is part of a wider project that attempts to integrate Amharic into the CAT2 machine translation system. The first section of this paper presents some of the observations made regarding the organization of Amharic verbs in the course of developing Amharic morphological analyzer. In machine translation systems, the lexicon plays a significant role in the analysis and generation of text by providing the different levels of processes with morphological, syntactic and semantic information. In section two, we focus our analysis on the particular aspect of lexical entry, that is, the specification of the base lexeme in a way which allows compositionality of translation. Finite-state techniques and two-level morphology has been the main computational model for Arabic and other Semitic languages (Kay, 1987; Beesely 1996). The last section reports on the implementation of Amharic morphology under these frameworks.

2 Amharic Verb Classification
Amharic language, like other Semitic languages such as Arabic, exhibits the root-pattern morphological phenomenon. This is especially true of Amharic verbs, which rely heavily on the arrangement of the consonants and vowels in order to code different morphosyntactic properties. Therefore, identification of the most frequently occurring consonant and vowel patterns has been a logical starting point in most attempts to organize Amharic verbs into classes. For example, Bender and Fulas (1978) identified 11 major classes, each consisting of different number of subclasses (a total of 42 subclasses), for simple Amharic verbs using the criteria: consonantal skeleton, gemination pattern, occurrence of vowel other than ?, occurrence of initial a or t, presence of w, y, h in the root consonants, and identical consonants in sequence. Leslau (1995), Cohen (1978) and Dawkin (1969) also provide another classification of Amharic verbs on the basis of some of these criteria. One common characteristic of these approaches is that they propose a relatively large set of classes. Baye (1999) questions the above ways of organizing Amharic verbs. Specifically, he challenges the widely accepted belief that the number of consonantal radicals ranges from one to six. He claims that the base form of Amharic verbs consists of three radicals. Any deviation from this is to be accounted for through the process of root reduction or extension. Although previous studies have been instrumental in identifying the generalizations underlying Amharic morphology, very little work has so far been done on the implementation aspect. In the current work, we try to point out the implication of the different proposals in implementation which we present in the next few paragraphs. Formulation of the criteria mentioned above relies to some extent on the observation made on the surface forms as shown in Table 1 (Baye 1999). A root form takes different patterns for different morphosyntactic categories of which the most common ones are perfect, imperfect, imperative, jussive, gerundive, and verbal. We also see that the words differ with respect to

Amharic verb lexicon in the Context of Machine Translation their surface realizations. The verb sbr is a tri-radical verb. All the three consonants appear in almost all morphological derivations of the verb. The verb smA is also postulated as tri-radical verb consisting of three consonants of which only the first two consonants appear in the surface form of the verb. The last consonant is a place holder for a lost laryngeal consonant whose absence is indicated on the surface by the change of vocalic pattern in the second radical, i.e. ? ->a, and also by the introduction of new radical t in the verbal form which does not appear in the base form. The third verb AyY lost the first and third radical hence exhibits an idiosyncratic surface pattern far different from the first and second verbs. The verb mnzr is a quadrilateral verb consisting of four consonants which appear in almost all surface forms of the verb resulting in different surface patterns. Although the last one trgwm is also a quadrilateral verb consisting of four radicals exhibiting similar morphological processes as mnzr, it has been put in a different subclass because of the existence of a labiovelar consonant gw which effects some idiosyncratic vocalic changes in some derivation of the surface forms. On the bases of their idiosyncratic properties all these verbs are grouped into different classes. Radicals1 sbr – ‘break’ smA – ‘hear’ qrY – ‘remain’ AyY – ‘see’ Stems Perfect CVCXVC Underly. s?bX?rSurface s?bb?rUnderly. s?mX?ASurface s?mmaUnderly. q?rX?YSurface q?rrA?yX?YAyy?m?n?zX?rm?n?zz?rt?r?gwX?mt?r?ggomImperfect CVCC -s?br-s?br-s?mA-s?ma -q?rY-q?r-A?yY-ay-m?n?zX∧r -m?n?zz∧r -t?r?gwX∧m -t?r?ggumJussive CCVC -sb?r -sb?r -sm?A -sma -qr?Y -qr -Ay?Y -y -m?nz∧r -m?nz∧r -t?rgw?m -t?rgum Gerund CVCC s?brs?brs?mAS?mtq?rYq?rtA?yYAytM?nz∧rM?nz∧rt?rgw∧m t?rgumVerbal CCVC -sb?r -sb?r -sm?A -smat -qr?Y -qr?t -Ay?Y -ay?t -m?nz?r -m?nz?r -t?rgw?m -t?rgom

Underly. Surface mnzr- ‘change’ Underly. money Surface trgwm-‘translate’ Underly.

Table 1: Bender and Fulas (1978): Simple verb forms However, looking at the different underlying forms and the derivation process, one can see some regularities and relationship between the different verbs. The underlying stem form, e.g s?bX?r-, is obtained by intercalating the root consonants sbr and vowel patterns ? in the respective slots of the template. Note that the vowel ∧ does not constitute the vowel pattern rather it is inserted using the general epenthesis rules (using Xerox rule format) given in (1). It encodes the fact that Amharic does not allow consonant clusters at the beginning of a word. (1) Cons = Consonant set [..] -> ∧ || .#. Cons _ Cons; The underlying stem forms still contain abstract morphophonological elements (e.g. X for gemination) that need to be realized through the application of alternation rules. Gemination, for example, is handled using (2) which simply spreads the geminated consonant to the neighbouring slot X resulting in a sequence to two identical consonants.
1 The symbols A, and Y function as a placeholder for lost radicals. And the symbols C, V and X in the template forms indicate consonant, vowel, and gemination of the previous consonant respectively.

Sisay Fissaha and Johann Haller (2) Cons = Consonant set X:C <=> C _ : ; where C in Cons ; The verb s?mma (s?mX?A-), which lost the final radical A, introduces the consonant t in its verbal and gerund derivations and changes the quality of the vowel of the penultimate radical. This is taken care of using (3), (3) A -> t || _ +Gerund ? -> a || _ A +Verbal A -> t || _ +Verbal ? -> a || _ A +Perf

The final radicals of AyY and qrY are mapped into the corresponding surface forms using (4). (4) Y -> t || _ +Gerund Y -> t || _ +Verbal (3) & (4) do not apply to the verb sbr as it does not have any lost radicals. However, because of its underlying pattern it is related with verbs like AyY and qrY. The above sample alternation rules in turn show that these verbs have more common features. The apparent differences that one may observe on the surface form can be eliminated by using a few sets of rules and by postulating the right root forms. The word forms like mnzr2 do not have a related tri-radical form which has a semantically sound derivation. Therefore, it has been found necessary to postulate a quadrilateral template forms. The idiosyncratic vocalic pattern appearing in the surface forms of trgwm (due to the labioveral gw) can be generated using (5). LabioCons variable represents all the labiovelar consonants whereas SimpCons represents corresponding simple form. (5) LabioCons = {lw, mw, rw …} SimpCons = {l, m, r….} LabioCons -> SimpCons o || _ ? ; LabioCons -> SimpCons u || _ Cons ; We have also postulated template forms containing five radicals. There are very few verbs having five distinct radicals. Most verbs of this class contain duplicate radicals. As a result, one is tempted to derive these verbs from the corresponding tri-or quadri-radical forms. bl?l? (‘glitter’) is an example verb of this class. Its perfect form is given by -bl???ll??-. Assuming a stem template form of -bl??-, the perfect form may be thought of as being derivative of bl??- by copying the second and third radicals. Here we face some problems. On the one hand, –bl??- does not have any meaning as this form does not exist in the language. On the other hand, this derivational process seems to lack sound semantic basis which one usually finds in other similar derivational phenomena like reduplicative derivations, e.g. s?bb?r? (‘he broke’) and s?babb?r? (‘he broke into pieces’). Due to the rarity of the verbs, and some practical reasons to be discussed in the next sections, we define five radical template forms.

2

Baye (1999) claims that the second radicals of most quadriradical verbs have the feature [+continuant] and , hence, are predictable. However, this rule has some exceptions, e.g. b???rr?q

Amharic verb lexicon in the Context of Machine Translation In general, all the above suggest that some of the works (Bender et. al 1978; Leslau 1995) in Amharic morphology tend to provide a fine-grained classification whereas others (Baye 1999) postulate a highly abstract generalization about Amharic verb base forms. The position taken in this paper is an intermediate one. Some level of abstraction has been introduced in order to capture the generalization relating to commonly occurring lexical items at the same time postulating higher order template forms to avoid specification of highly abstract form.

3 Lexical entry specification for lexical transfer
CAT2 is a transfer-based machine translation system where the lexical transfer component constitutes the core of the transfer module. Lexical transfer is defined on a base lexeme defined in the lexicon. The fact that one can not list all word forms in the lexicon means one needs some recursive means of expressing meaning, i.e. compositional translation of the words along the line of compositional treatment of sentences. This raises an important question about the structure of the lexicon in general and the form of the lexical units which serve as bases for lexical transfer in particular. A base form of an Amharic verb may be postulated at different levels of abstraction. However, the most plausible one, which is in line with the analysis provided in the previous section, is the one which considers root consonants as the base form of an Amharic verb. Hence assuming that the base form of Amharic verbs consists only of root consonants, an example lexical entry for the verb sbr would look like; {lex=sbr, ...}. Some of the simple and complex derived forms of this verb are given in Table 4. Perfect Imperfect Causative Passive Reduplicative s?bb?r? y∧s?br ass?bb?r? tes?bb?r? s?babb?r? he broke he was breaking or he will break he made someone break it was broken he broke into pieces

Table 4: Derivational paradigm of sbr The surface form s?bb?r? will consist of the template form (CVCXVC), root consonants (sbr), vowel pattern (?), and the suffix morpheme (?). While the root sbr determines the basic meaning for the word, the template form CVCXVC along with the vowel pattern provides additional morphosyntactic information such as aspect and tense. The suffix morpheme corresponds to the third-person-singular-masculine suffix pronouns. Compositional translation of the above verb would then mean translation of each of the above morpheme into the equivalent form of the target language. Using our definition of lexical entry, the base form would then be translated using a transfer rule of the form {lex=sbr} ! {lex=’break’}. The lexeme sbr is stripped off all the grammatical opposition which makes it good candidate for specification of lexical transfer. It is abstract enough to include all concepts having to do with breaking. This analysis is in line with the idea introduced by Streiter(1996) that only those lexemes which correspond to the notional domain should appear in the lexical transfer. However, reducing all the surface forms to a single canonical form of sbr results in a significant loss of information. In order to avoid that, we should be able to encode their differences in some systematic way (e.g. use of semantic features). However, such process of abstraction should be done with care otherwise we may run into the problem of overtranslation, in which compositional translation of the word results in some awkward translation which does not correspond to the meaning of the whole (Streiter 1996).

Sisay Fissaha and Johann Haller The suffix morpheme can be featurised and transferred into the target language whose synthesis component would then generate the required pronoun. Translation of the remaining morpheme (C?CX?C), however, is not straightforward. Direct transfer does not seem plausible as it does not have an equivalent form in the target language. Another option is to encode the syntactico-semantic information expressed by this morpheme. The perfect form is used to render wide varieties of meaning. Typically it is used to express the meaning of past tense as in (6). (6) l?-u m?staw?t-u-n s?bb?r-?-w boy-DEF mirror-DEF-ACC breakPAST-SUBJ-OBJ ‘The boy broke the mirror’.

The tense may also vary depending on the context in which the perfect form is being used. In conjunctive construction, its tense depends usually on the tense of the main clause (7). (7)

∧yebella anebbebe
‘He read while he ate’. ∧yebella yan?bbal ‘He reads while he eats’.

The imperfect form commonly expresses present and future tenses (8). (8) l?-u m?staw?t-u-n y∧-s?br-al boy-DEF mirror-DEF-ACC SUBJ-breakPRES-AUX ‘The boy will break the mirror’.

With the verb n?bb?r?, it can also be used to express habitual or durative action as in (9), (9) l?-u m?staw?t-u-n y∧-s?br n?bb?r boy-DEF mirror-DEF-ACC SUBJ-breakPRES AUX ‘The boy was breaking the mirror’.

In some cases, it may also deviate from the meaning licensed by the root form, e.g. msl (‘resemble’), y∧m?sl (‘as if’) (10) leba y∧-m?sl thief SUBJ-resembleIMPERF ‘as if he were a thief’ A similar conclusion may be made of jussive, gerund, imperative, and verbal forms. The template forms tend to behave as inflectional affixes having some basic meaning which occurs most frequently and with some exceptional deviations. Complex derived forms include among others the passive, causative, reduplicative, and reciprocal. The derivational processes are either internal in which CV patterns are changed, or external where derivational affixes are attached to the simple derived forms discussed above. It may also involve a combination of internal and external changes. Some of the derivations in this class serve to express adverbial functions as the language has limited lexicalized adverbs. Others introduce wide varieties of modifications to the meaning expressed by the root. The passive, for example, is formed by prefixing the morpheme t? to the passive templates. Note

Amharic verb lexicon in the Context of Machine Translation that passive template forms show some deviations from template forms we saw earlier. In addition to the usual passive meaning, the passive template forms are also used to turn transitive (11) into intransitive verbs (12), and express reflexive meaning (13). (11) l?-u m?staw?t-u-n m?ll?s-? boy-DEF mirror-DEF-ACC returnPAST-SUB ‘The boy returned the mirror.’ (12) l?-u k?-t?mari b?t t?-m?ll?s-? boy-DEF from-student house PASS-returnPAST-SUB ‘The boy returned from school.’ (13) l?-u t?-a???b-? boy-DEF REF-washPAST-SUB ‘The boy washed himself.’ There are a number of verbs which have t? as part of the basic stem (14). (14) l?-u abat-u-n t?k?tt?l-? boy-DEF his father followPAST-SUB ‘The boy followed his father.’ There are two causative derivational morphemes a and as. Like the passive derivation, each of the causative derivational morphemes takes an overlapping set of template forms. The causative derivations affect the meaning of the basic verb by adding one or more element in the argument structure of the verb. (15) l?-u l?bb?s? boy-DEF dressPAST-SUB ‘The boy got dressed.’ l?-u a-l?bb?s? boy-DEF caus-dressPAST-SUB ‘The boy made (someone) dress.’ The reduplicative derivation mainly expresses such adverbial function as intensity, reduplication, repetition of action, etc. The reduplicative derivation involves mainly internal change. It introduces a new set of template forms in which the second radical will be duplicated in tri-radical verbs. Table 5 summarizes the different verb template forms for sbr. Verb forms Perfect Imperfect Gerund Imperative Verbal Active CVCXVC CVCC CVCC CCVC CCVC Passive CVCXVC CVCXVC CVCC CVCVC CVCVC a-causative CVCXVC CVCC CCC CCC CCVC as-causative CVCXVC CVCXC CVCXC CVCXC CVCXVC Reduplicative CVCVXXV C CVCVXXC CVCVXC CVCVXC CVCVXVC

Table 5. Template forms for derivational paradigm of sbr These derivational forms are by no means exhaustive and may take different forms for other verb types. But they suffice to show several instances of syncretism that exist between the forms. The same template form (e.g. CVCXVC) serves different functions. In addition, the different derivational paradigms allow different degree of formalization. The simple derived forms may generally be associated with expression of aspect and tense which is relatively closed and amenable to formalization into semantic features. The meaning expressed by the

Sisay Fissaha and Johann Haller passive and causative forms show some degree of variation. However, due to the regular occurrence of the basic meaning, we can still benefit by positing some canonical form corresponding to the most typical meaning and handling exceptions with special rules. Reduplication, on the other hand, introduces quite a range of adverbial functions into the basic meaning of the root resisting any form of abstraction. In general, the possibility of decomposing verbs into constituting morphemes which obey the compositional treatment of the meaning of the word suggests that a lexeme consisting of root consonants is a good candidate for defining lexical entries on the basis of which lexical transfer operates.

4 Implementation
Two-level morphology has been the main computational framework in the field of computational morphology. Its dependence on concatenation operation has created, in its early stage, some difficulties in handling some non-concatenative phenomenon, like reduplication, and Semitic stem interdigitation. A number of attempts have been made in order to overcome these problems (Kay, 1987; Beesely 1996). Its Xerox finite state implementation carries a number of innovative ideas which circumvent these problems without deviating much from the basic underlying principles (Beesely 1996). The basic idea originates from Autosegmental approach to Semitic languages as it is formulated by McCarthy (1987). This is especially true of the treatment of consonantal roots, vowel melodies and the template forms as separate but interrelated entities. However, unlike the original work which represents each of these components in a different tier (multiple dimensions), it makes use of a linear representation. As the name implies, two-level morphology involves specification of two levels (base form and surface form) that are related through rules. For Amharic, as in any Semitic language, the stipulation of base forms involves specification of the consonantal roots, template forms, and vocalic patterns and the associated morphosyntactic features. Figure 1 shows representation of the Amharic verb s?bb?r- using Xerox finite state regular language. ....^[sbr .m> CVCXVC <m. ?* ^].... Compile_replace ... s?bX?r ...

Figure 1 Regular expression for stem interdigitation The compile replace algorithm of Xerox finite state tool conflates the three morphemes into one form giving the stem of the verb. While derivation involving only stem interdigitation can be handled using this regular expression, internal change seems to pose the same problem. One form of verb derivational processes involving internal change is the reduplication, s?babb?r?w. The regular expression for generation of reduplicated stem is given in Figure 2,
....[abc]^2.... Compile_replace ... abcabc ...

Figure 2 Regular expression for reduplication In s?babb?r?, the second radical is duplicated and the copy is geminated. This in turn requires the application of two regular expressions, stem interdigitation and reduplication. One may be tempted to formulate a nested regular expression similar to the following, (16) -^[sbr .m>. CV[CV]^2C <m. [e a e]^] .o. Compile-Replace >> -^[sbr .m>. CVCVCVC <m. [e a e]^] .o. Compile-Replace >> -s?bab?r-

Amharic verb lexicon in the Context of Machine Translation Unfortunately, such nested formulation of regular expression is not possible under the current implementation. Moreover, the replica is not an exact match of the original which makes the application of the mechanism proposed for reduplication difficult. Therefore, a separate template form for reduplication has been defined. S?bb?r? y∧s?br ass?bb?r? tes?bb?r? s?babb?r? sbr+CVCXVC+?+Active+Perf+Past+Sing+3rd+Masc 3rd+Masc+Sing+Active+sbr+CVCC+?+Verb+Imperf+3P+Masc+Sing Causative+sbr+CVCXVC+?+Verb+Perf+3rd+Masc+Sing Passive+sbr+CVCXVC+?+Verb+Perf+3rd+Masc+Sing Active+sbr+CVCVXXVC+eae+Verb+Perf+ITERT+3P+Masc + Sing Table 6. Template forms for reduplicative derivations The output of the above analysis would then be a sequence of morpheme and morphosyntactic features as shown in Table 6. However, since higher level of processing, such as parsing or generation, require more detailed lexical information, the output should be restructured and augmented with more detailed syntactico-semantic information, e.g. argument structure.

4.1 CAT2 Lexical entries
An attempt has also been made to model the Amharic verb morphology using the string unification facility of the CAT2 morphological component. (17) shows a partial specification of a derivational rule for perfect-active form of a tri-radical verb stem. (17) Cons={h, l, m, s, r, ….} {stem=C1+V+C2+C2+V+C3, voice = act, aspect=perf, tense=past, …}.[{lex=C1+C2+ C3, C1=Cons, C2=Cons,C3=Cons, V=?, ….}]. Along with lexical entry shown in Figure 3, it is possible to model a simple derivation of s?bb?r? from its root form sbr.
lex=sbr, cat=v, type=a, role=agent head={cat=n} role=patient head={cat=n}

arg1= frame= arg2=

Figure 3 Example lexical entry in CAT2 This approach has some important advantages. First, it provides a sophisticated means of specifying lexical entries using features and unification operation. Second, it is possible to order the rules which may be useful in determining the right order in which the different affixial elements combine with the root form. It also allows modifying the argument structure of a lexeme in some derivational processes. However, run-time unification of the characters and patterns has been found to be a very slow process. Furthermore, some alternation rules are very difficult to model using this mechanism. Hence, this approach has been abandoned, and instead a strategy is now being devised for integrating the two components. One possible strategy which may be envisaged in this connection is that the Xerox tool will be used to analyse word forms and extract all the morphosyntactic features derivable from the surface

Sisay Fissaha and Johann Haller form whereas CAT2 can be used to complete the morphological descriptions by augmenting the input with more detailed lexical information, e.g. subcategory and selectional restriction.

5 Conclusion
This paper addressed three related issues: classification of Amharic verbs, aspects of lexical entries for lexical transfer and implementation of morphological analyser. There are a number of proposed frameworks on ways of classifying Amharic verbs. While some provide a detailed classification scheme making lexical specification relatively redundant while others postulate abstract generalizations requiring relatively complex implementation strategy. The strategy adopted in this work is to restrict the process of abstraction only to those frequently occurring phenomena and classes of verbs while introducing some diversity in order to account for some idiosyncratic properties based on some practical consideration. The form of lexical entries is another question raised in this paper. The complexity of Amharic morphology poses difficulty in deciding on the form of the lexeme on which lexical transfer rules apply. Although it is difficult to come up with a proposal, which is applicable to all situations, examination of the different derivational processes shows that a lexeme consisting of root consonants seems a good candidate for lexical transfer. Finally, implementations of Amharic morphological analysis using Xerox finite state tools shows that most of the morphological phenomena can be handled using the finite state operations. However, there are some derivational processes which involve simultaneous application of both stem interdigitation and reduplication operations that can not be accommodated by the current system. This requires stipulation of additional template forms for the derived verbs.

References
Abiyot Bayou (2000), Developing automatic word parser for Amharic verbs and their derivation, Master Thesis, Addis Ababa University. Baye Yimam. (1999) Roots reductions and extentions in Amharic. Ethiopian Journal of Language Studies. no 9, p. 56-88. Bender, M.L and Hailu Fulas (1978). Amharic Verb Morphology. Michigan State University. Dawkins, C.H (1969), The Fundamentals of Amharic, Addis Ababa, Sudan interior mission. John J. McCarthy(1985), Formal Problems in Semitic Phonology and Morphology, New York Kenneth R. Beesley, (1996), Arabic finite-state morphological analysis and generation. In COLING’96, vol 1, pages 89-94. Leslau, Wolf (1995) Reference Grammar of Amharic, Otto Harrassowitz, Wiesbaden. Martin Kay (1987), Nonconcatenative finite-state morphology. EACL’87, pages 2-10. Mesfin Getachew (2001), Automatic part of speech tagging for Amharic language: An experiment using stochastic HMM, Master Thesis, Addis Ababa University. Nega Alemayehu (1999) Development of stemming algorithm for Amharic text retrieval, PhD Thesis, University of Sheffield. Streiter, Oliver (1996) Linguistic modeling for Multilingual Machine Translation, Thesis, University of Saarland, Shaker Verlag. PhD


赞助商链接
更多相关文档:
更多相关标签:
网站地图

文档资料共享网 nexoncn.com copyright ©right 2010-2020。
文档资料共享网内容来自网络,如有侵犯请联系客服。email:zhit325@126.com