Parse-realize based paraphrasing and SMT corpus enriching
CSTR:
Author:
Affiliation:

(School of Computer Science and Technology, Harbin Institute of Technology, 150001 Harbin, China)

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To resolve the low-coverage problem of the statistic machine translation training corpus, a dependency parsing and sentence realization based paraphrasing method is proposed. The input sentence is first parsed into a dependency tree, and then the tree is realized into multiple natural language sentences. Although the generated sentences have the same lexical words, the expressions of word orders are re-arranged. The experiments shows that the paraphrasing method can be used to enlarge the bilingual corpus for statistic machine translation and the method efficiently relieves the low-coverage problem of training corpora without any extra resources, finally the translation quality is improved. 

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: May 30,2013
  • Published:
Article QR Code