Bilingual terminology mining - Using brain, not brawn comparable corpora

E. Morin, B. Daille, K. Takeuchi, K. Kageura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

64 Citations (Scopus)

Abstract

Current research in text mining favours the quantity of texts over their quality. But for bilingual terminology mining, and for many language pairs, large comparable corpora are not available. More importantly, as terms are defined vis-à-vis a specific domain with a restricted register, it is expected that the quality rather than the quantity of the corpus matters more in terminology mining. Our hypothesis, therefore, is that the quality of the corpus is more important than the quantity and ensures the quality of the acquired terminological resources. We show how important the type of discourse is as a characteristic of the comparable corpus.

Original languageEnglish
Title of host publicationACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
Pages664-671
Number of pages8
Publication statusPublished - Dec 1 2007
Event45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic
Duration: Jun 23 2007Jun 30 2007

Publication series

NameACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics

Other

Other45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
CountryCzech Republic
CityPrague
Period6/23/076/30/07

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Bilingual terminology mining - Using brain, not brawn comparable corpora'. Together they form a unique fingerprint.

  • Cite this

    Morin, E., Daille, B., Takeuchi, K., & Kageura, K. (2007). Bilingual terminology mining - Using brain, not brawn comparable corpora. In ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (pp. 664-671). (ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics).