|Abstract: ||在資訊早已爆炸的地球村中，能夠從網路上找到各種語言相關的資訊，逐漸是一種重要的國力和關鍵的個人能力；因此跨語言的資訊檢索是一個重要的研究方向。在此同時，能夠使用一種甚至多種外國語文也逐漸是一項重要的基礎能力。我們該如何運用資訊科技來提高人們檢索多國語文資訊的效率，與提高學習外國語文的效率，牽涉到資訊科技、語言學、認知科學、教育學等領域的專業知識。本研究計畫延續申請人過去幾年的研究成果，計畫整合及開發人工智慧技術和自然語言處理技術，以機器翻譯的技術做為核心，建構電腦軟體，以提供下列功能：(1)輔助特定領域（資訊科技、社會科學）且有限度的機器翻譯；(2)特定領域之英、漢、歐語跨語言資訊檢索；(3)電腦輔助中文學習軟體的建置；(4)透過與心理與語言學專家的合作，研究人類的閱讀認知歷程，以進一步協助人類的語文學習；(5)分析英文短文可讀性。人工智慧、機器學習與自然語言處理等技術是達成上述各項工作目標的重要基礎。我們曾經利用自然語言處理技術來輔助英文克漏詞試題的編製與不同題型的中文試題編輯。目前已利用比較複雜的語言分析技術來分析英文短文的閱讀難度。在電腦輔助英文試題的中譯工作方面也累積了相關基礎與經驗：蒐集平行語料、句子對列、詞彙對列、建構轉譯機制。此外，在目前執行中之計畫關於語文閱讀認知模型部分也獲致部分進展，以上各項工作都已經發表過相關初步成果。目前執行中之計畫，已經將中文錯字相關研究的三年多成果投稿於ACM TALIP，也建構了一個漢字學習遊戲。在新計畫結案之時，希望能夠達成下列目標：(1)提供跨語言(英、漢、歐語)資訊檢索的基本雛形，特別是關於資訊科學論文與部分社會科學相關的電子圖書館材料（與政治大學圖書館合作）；(2)能夠完成中文學習的多個電腦輔助遊戲與試題編輯環境；(3)能夠進一步認識中文使用者閱讀中文文字材料的認知歷程；(4)能夠更加掌握關於短文可讀性的相關因素。|
The explosive growth in the communication technologies and Internet-based applications has made a person’s competence in handling two or more languages increasingly important. Tracking relevant information available in a wide range of languages and getting fluent in more foreign languages have become more and more important to compete in the job market. Applying the machine-translation related technologies, researchers of computer science can offer cross-language information retrieval, and can apply computing technologies to improve the efficiency of people’s learning foreign languages, with the additional help of expertise in linguistics, cognition, and education. Based on our previous achievements in applying technologies for artificial intelligence, machine learning, text processing, and technology-enhanced language learning, we would like to integrate and develop techniques of artificial intelligence and natural language processing to construct computer software that offers the following supports: (1) limited machine translation for specific domains (computer science and some social sciences), (2) cross-language information retrieval for specific domains (computer science and some social sciences between English, Chinese, and, if possible, one European language), and (3) computer-assisted language learning tools and games for Chinese. In addition, we would like to work with psycholinguists to study the cognitive processes of text understanding of Chinese users, by applying the information gathered by eye tracking, EGG, and fMRI. We would also further analyze the factors that influence readability of short essays in English. Techniques of artificial intelligence and natural language processing have proven to be instrumental to achieve the goals listed above. We applied information about word collocation and selectional preference to construct test items for English cloze tests many years ago, and have built prototypes for computer assisted language learning for Chinese. We have also applied more complex linguistic features to analyze the text readability of English in a published paper, and have built a prototype for computer assisted test item translation. In order to understand human’s cognitive process for reading text materials, we have collected and analyzed the data of eye movements from about 50 human subjects, and the results were reported in TAAI 2010. When we finish this two-year project, we hope to accomplish the following goals. First, we would like to build tools for cross language information retrieval (CLIR) for technical documents in computer science. If appropriately funded, we would also like to work with the library of the National Chengchi University to include a European language in the CLIR service. Second, we will build more tools and games for learning Chinese. Third, we will accumulate more experience in building computational models for cognitive processes for human’s process of Chinese text. Fourth, we will come up with a more complete list of factors that influence readability of reasonably long essays.