 |
|
|
| Mohammed A. Attia
|
|
Ph.D. thesis title: Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation
Description:
This research investigates different methodologies to manage the problem of morphological and syntactic ambiguities in Arabic. Morphological ambiguity in Arabic is a notorious problem due to the richness and complexity of Arabic morphology. We show how an ambiguity-controlled morphological analyzer is built in a rule-based system that takes the stem as the base form using finite state technology. We point out sources of legal and illegal morphological ambiguities in Arabic and show how ambiguity in our system is reduced without compromising precision.
Syntactic ambiguity is also a major problem for large-scale computational grammars which cover a realistic and representative portion of a natural language. We identify sources of syntactic ambiguities in Arabic, focusing on four ambiguity-generating areas which have the greatest impact. These are the pro-drop nature of the language, word order flexibility, lack of diacritics, and the metamorphosis of Arabic nouns. We deal with ambiguity not as one big problem, but rather as a number of divisible problems spreading over all levels of the analysis: pre-parsing, parsing and post-parsing stages. The pre-parsing stage contains all the processes that feed into the parser whether by splitting a running text into manageable components (tokenization), analyzing words (morphological analyzer) or tagging the text. These processes are at the bottom of the parsing system and the effect of ambiguity in this stage is tremendous as it propagates exponentially into the higher levels. The parsing stage is the process when the syntactic rules and constraints are applied to a text, and the subcategorization frames are specified. The post-parsing stage has no effect on the number of solutions already produced by the parser, but this stage only controls the selection and ranking of these solutions.
We build an Arabic parser using XLE (Xerox Linguistics Environment) which allows writing grammar rules and notations that follow the LFG formalisms. It includes a parser, transfer and generator components, which makes it suitable for Machine Translation. We also formulate a description of main syntactic structures in Arabic within the LFG framework.
|
| | | | |  | | | During the course of my study I developed a number of useful finite state tools for processing Arabic texts. Read more and download
| |
| |  |
|
|
Publications
Papers:
- Mohammed Attia. (2007) 'Arabic Tokenization
System'. ACL-Workshop on Computational
Approaches to Semitic Languages, Prague. [pdf
version]
- Mohammed Attia. (2006) 'An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modelling Finite State Networks'. The Challenge of Arabic for NLP/MT Conference, October 2006. The British Computer Society, London.
[pdf version]
- Mohammed Attia. (2006) 'Accommodating Multiword Expressions in an Arabic LFG Grammar'. In T. Salakoski et al. (Eds.): FinTAL 2006, Lecture Notes in Computer Science. Vol. 4139, pp. 87 - 98, 2006. Springer-Verlag Berlin Heidelberg 2006.
[pdf version]
- Mohammed Attia. (2005) 'Developing a Robust
Arabic Morphological Transducer Using Finite
State Technology'. 8th Annual CLUK Research
Colloquium, Manchester. [pdf
version]
Presentations:
- Mohammed Attia. (2005) 'Functional and
Anaphoric Control in Arabic'. A presentation at
ParGram Fall Meeting, Gotemba, Japan. [Slides
available]
- Mohammed Attia. (2005) 'Accommodating Multiword Expressions in an LFG Grammar'. A presentation at ParGram Fall Meeting, Gotemba, Japan.
[Slides available]
- Mohammed Attia. (2005) 'Developing a Robust Arabic Morphological Transducer/Tokenizer, and Integration with XLE'. Presented on my behalf in the ParGram Spring Meeting, Parc, Palo Alto, USA.
[Slides available]
- Mohammed Attia. (2004) 'Report on the Introduction of Arabic to ParGram'. Presented at ParGram Fall Meeting, Dublin, Ireland.
[pdf version]
E-Books:
- Mohammed Attia. (2003) 'Implications of the Agreement Features in Machine Translation'.
M.A. Thesis.
- Mohammed Attia. (2004) 'Common English Propverbs'.
E-Books.
- Mohammed Attia. (2007) 'Common English Expressions'.
E-Books.
|
|
|
|
|
|
|
|