    政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文
    Title: 漢字古文書光學字元辨識之文本閱讀順序偵測研究
    Reading Order Detection in Optical Character Recognition for Historical Chinese Documents
    Authors: 馬行遠
    Ma, Hsing-Yuan
    Contributors: 劉昭麟

    Liu, Chao-Lin
    Huang, Hen-Hsen

    Ma, Hsing-Yuan
    Keywords: 閱讀順序
    Reading Order Detection
    Pairwise Learning-to-Rank
    Multimodal Representation
    Archival Document ProcessingMultimodal Representation
    Date: 2023
    Abstract: 在光學字元識別(OCR)和文檔版面分析(DLA)的研究和發展已累積了多年的豐富經驗,然而閱讀順序偵測的問題卻仍然是一個待解的難題。閱讀順序偵測在維護文檔原始結構以及對文字偵測後的校正過程中,扮演著至關重要的角色。目前,大部分閱讀順序偵測工具主要依賴於基於規則的算法來處理。對於結構簡單、排列規整且間距均勻的現代文檔,這些方法的確能夠取得不錯的成果。然而,當面對手寫或古代文本中複雜的版面以及不平整的邊緣,現有的方法便明顯力不從心。因此,我們迫切需要一種能對複雜版面的中文古籍進行精準閱讀順序偵測的策略。
    Optical character recognition (OCR) and document layout analysis (DLA) have been developed for years.
    Still, reading order detection (ROD) is a problem that needs to be solved.
    ROD plays an important role in preserving the original structure of the document as well as in post-OCR correction.
    Most modern ROD tools rely on rule-based algorithms to place detected text coordinates in order.
    These approaches may work well for simple, modern documents because they are well-aligned and spaced.
    However, due to the complex layouts and curved layout edges in handwritten or historical documents, current methods are inadequate.
    In this paper, we proposed a multimodal approach to ROD by formulating the task as pairwise learning-to-rank.
    We evaluate our approach on the MTHv2 dataset.
    Experimental results indicate that, compared to previous research methods, our model successfully reduced the page error rate by 25%. Furthermore, it demonstrated good performance even in scenarios with limited training data and insufficient text detection information, proving the robustness and practical value of this research.
