You are here : Home News and Events  

News and Events

TTC @ EACL 2012 - April 23 - 27, Avignon, France

The EACL conference is the triennial conference of the European chapter of the Association for Computational Linguistics (ACL). EACL 2012 was the thirteenth EACL conference that built on the success of previous conferences in Athens (2009) and Trento (2006).

The TTC project was presented by Alexander Fraser and Marion Weller (IMS) with the presentation "Modeling Inflection and Word-Formation in SMT" (PDF).

Abstract:

The current state-of-the-art in statistical machine translation (SMT) suffers from issues of sparsity and inadequate modeling power when translating into morphologically rich languages. We model both inflection and word-formation for the task of translating into German. For inflection, we generalize over different inflected forms of a word, while also ensuring coherence of the inflected output. For word-formation, we address compounding, which is highly productive in German. For both inflection and word-formation, we address the problem of portmanteaus. We translate from English words to an underspecified German representation and then use linear-chain CRFs to predict the fully specified German representation. We show that improved modeling of inflection and word-formation leads to improvement in translation performance.

 

 

TTC @ Paris, Second Year Review Meeting, March 2012

TTC presents its outcomes from the first year of the project (2010). Significant progress has been made towards the main scientific and technological objectives of the project. TTC addressed such important issues as end user needs of the TTC tools, requirements that these tools will meet, monolingual and interlingual comparability, a release of the first version of the crawler and comparable corpora, terminology identification, annotation, variation and morphological processing, bilingual terminology extraction, a tool to handle comparable corpora, open terminology platform interlinked to EuroTermBank and application-oriented evaluation. TTC paid much attention to its dissemination activities, web presence, in particular. Annual public report 2010 was published and can be found here.

TTC presents its outcomes from the second year of the project (2011).

Significant progress has been made towards the main scientific and technological objectives of the project. TTC addressed such important issues as releases of:

  • the final version of the Babouk crawler;
  • comparable corpora in the 2 domains of wind energy and mobile technologies for 7 project languages as well as Danish;
  • the TermSuite tool to handle comparable corpora - perform monolingual terminology extraction and bilingual alignment operations;
  • rule sets for term variant recognition and alignment;
  • rule sets for inflectional and word formation analysis of morphologically complex term candidates;
  • domain-specific terminologies in the 2 domains of wind energy and mobile technologies for 7 project languages.

TTC paid much attention to its dissemination activities, web presence, in particular. Annual public report 2011 was published and can be found here.

 

TTC @ DGFS-CL 2012 - March 8, Frankfurt, Germany

Marion Weller, Anita Gojun, Ulrich Heid, Béatrice Daille, Emmanuel Morin
Universität Stuttgart / Université de Nantes
Compiling terminological data using comparable corpora: from term extraction to dictionaries
For scientific domains, terminological resources like dictionaries are often not available or not up-to-date. Additionally, term variation (Daille 2005) is often not documented. As a result, translators working in technical domains usually spend much time building terminological resources.
The project TTC1 aims at compiling domain-specific lexical resources which are to be integrated into CAT tools and SMT systems. Since parallel data is often not available, comparable corpora are used: they are available for a large range of domains in many languages.
The TTC tool suite consists of the following steps:
corpus collection using a focused crawler (de Groc 2011)
pattern-based term extraction of terminologically relevant noun phrases from tagged and lemmatized text (Schmid 1994),
identification of term variants: (DE) Korrosionsschutz ↔ Schutz gegen Korrosion (corrosion protection ↔ protection against corrosion)
term alignment: for a given term of the source language, equivalents in the target language are searched and aligned. Term lists of both the source and target language, as well as a general language dictionary are taken as an input to this step.
In our poster presentation, we focus on term alignment, presenting two approaches: (1) lexical strategies and (2) the use of context vectors.
Terms do not necessarily have an equivalent of the same syntactic structure in other languages, particularly German compounds. By applying term variation patterns, compounds can be reformulated, resulting in term variants of different syntactic structures (Morin & Daille 2009). This allows to individually look up the components of a compound in the dictionary and identify matching target language terms: Stromspeicherung → Speicherung von Strom → storage of power / storage of electricity.
Terms and their translations tend to appear in comparable lexical contexts. For each source language term, context vectors are computed and translated into the target language. The translated vectors are then compared with target language context vectors (using cosine measure): terms with similar context vectors are likely to be translations. Since both approaches depend on the coverage of the dictionary, we consider the lexical strategies as an input for the context vector method.

The TTC project is presented by Marion Weller (IMS) at the Computational Linguistics section held at the annual meeting DGfS's Poster Session on March 8, 2012. The primary purpose of the section is to maintain the scientific exchange between theoretical linguistics and computational linguistics.


TTC Poster: Marion Weller, Anita Gojun, Ulrich Heid (IMS), Béatrice Daille, and Emmanuel Morin (UN). Compiling terminological data using comparable corpora: from term extraction to dictionaries (PDF).

 

 

CHAT 2012 - Call for Papers

CALL for PAPERS

The second workshop on the Creation, Harmonization and Application of Terminology resources (CHAT 2012) will be held in conjunction with the conference on Terminology and Knowledge Engineering (TKE 2012) on June 22 in Madrid.

Organizing Committee invites academic and industrial researches as well as language workers and students interested in terminology to participate in the workshop!

 

CHAT 2011 - the first workshop

 

TTC @ Open Day’s event for FP7 R&D in Latvia, November 21, 2011

The Open Day’s event for FP7 R&D in Latvia was held on November 21, 2011, at the Great Hall of the University of Latvia in Riga. The event was organised by the National Contact Point of the Seventh Framework Programme and the University of Latvia. TTC was presented with the midterm project poster by Tatiana Gornostay (TILDE).

FP7_Project_Day_in_Latvia_2011

 
  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  4 
  •  5 
  •  6 
  •  7 
  •  8 
  •  9 
  •  10 
  •  Next 
  •  End 
  • »
Page 1 of 11