Brief survey on machine translation use, input appreciated Автор темы: Jared Tabor
|
|
You should ask for the language pairs | Sep 21, 2011 |
Hi Jared, In the survey, if a person answers Yes to using MT, I think you should ask the language pairs, as it is an important piece of info. The usefulness of currently available MT systems greatly depend on the language pair they are being used for. Without including this info, I think the survey results may be hard to interpret correctly. Another piece of info you may want to ask whether the answerer is a freelancer or an agency. Katalin
[Edited at 2011-09-21 17... See more Hi Jared, In the survey, if a person answers Yes to using MT, I think you should ask the language pairs, as it is an important piece of info. The usefulness of currently available MT systems greatly depend on the language pair they are being used for. Without including this info, I think the survey results may be hard to interpret correctly. Another piece of info you may want to ask whether the answerer is a freelancer or an agency. Katalin
[Edited at 2011-09-21 17:14 GMT] ▲ Collapse | | |
Jared Tabor Local time: 11:09 ПЕРСОНАЛ САЙТА Автор темы Thanks Katalin | Sep 21, 2011 |
Hi Katalin, Thanks for the feedback, I agree that the language pair is an important factor in how useful MT will be for some tasks. Jared | | |
Jeff Allen Франция Local time: 16:09 несколько языков + ... language directions for MT | Sep 21, 2011 |
It is actually language direction and not just language pair where the level of MT is impacted. A few examples which enter into the equation of quality. 1) Market need: MT systems and software which aim at MT for translation publication (outward bound translation) -- basically what most professional translators produce translations for -- will usually be better in language directions FROM SOURCE language English to target languages. This is because a significant amount... See more It is actually language direction and not just language pair where the level of MT is impacted. A few examples which enter into the equation of quality. 1) Market need: MT systems and software which aim at MT for translation publication (outward bound translation) -- basically what most professional translators produce translations for -- will usually be better in language directions FROM SOURCE language English to target languages. This is because a significant amount of content is written in English. English to FIGS (French, Italian, German, Spanish) + RU directions tend to be the strongest. MT systems where content gisting (known as inward bound translation, and also under as requests of draft translations for understanding-only needs) for languages in general (including less widely used languages) will be better from other languages INTO English. These tend to be for intelligence community content filtering and gisting. These language directions usually include FIGS, Asian languages, Russian, & Middle Eastern languages (+ any language which is on the radar for the intelligence community) INTO English. 2) MT system type: - Rule and dictionary based MT systems (RBMT) have been traditionally been developed per language "direction" based on the needs mentioned above. These depend on the availability of grammar rules to enter into the system for both source and target language and the creation of a built-in bilingual glossary/dictionary. It is usually the less commonly used language which are developed the least for such MT systems, because it requires a bit of upfront development (usually about 2 years as a starter) to get these systems to maturity, so it requires a clearly defined business need and accompanying budget. - Statistical MT systems usually try to start out as language independent from the technical system side (because they are looking at combinations of characters and other forms) and yet really mainly depend on the availability of significant "quantities" of source and target language "content". These are usually abundant for FIGES, Russian and Asian languages. Much less available for any other language that does not fit in the vague group of "major world languages" Hybrid MT systems are now trying to address these concerns. 3) Maturity of the language direction + maturity of the language resource content: - rule-based MT language directions that have been around for 30+ years (such as EN-FR and EN-FR) will in general provide better quality output than brand new language direction projects (such as EN-Swedish) until the new projects stabilize over time. - Statistical MT systems are dependent on both the quality of the texts that are used to train the system and the quantity of texts that are available. 4) language typologies and language complexity When a language direction is developed on languages with the same general language typology (language structure similarity), there is a general tendency to produce better MT quality. This can be seen in that EN FR and EN ES are better than EN DE. Language directions introducing significantly different typologies such as EN and JP are a next level of challenge. This is why "Knowledge-based MT systems (KBMT)", based on underlying semantic analysis, have produced better quality for such language directions than the rule based or even statistical based system. The downside is that KBMT system are very time-consuming (and costly) to create. 5) MT language pivoting Any language that falls "outside" of being paired with English (or possibly French in the case of Systran) -- as in language X English -- risks using English as a pivot language to provide the language direction (for example: Swedish > English > Norwegian, Korean > English > Japanese). This is not very different from Language Service Providers/translation agencies who run into the same problem on major projects with multiple language pairs when theu cannot find Professional translators who can do some or all language directions directly, so they sometimes pass through English as a pivot language. We all know that this requires more time, much more quality care, and more money. Using a pivot language for MT suffers the same problem and is even more risky. 5) Comparative timeline A number of years back we conducted a comparative timeline study of developing both MT and speech systems for several language pairs and presented it in: Rapid-Deployment-Speech-tech-ICSLP1998-Lenzo-Allen-Hogan.pdf https://www.box.net/shared/n2bxmmeubr Many of the issues we described in that presentation/article are still valid today in the cases of points 1-4 above. Thus language pairs (and I would even say language directions) are quite important when looking at evaluating MT quality. Jeff ▲ Collapse | |
|
|
Jared Tabor Local time: 11:09 ПЕРСОНАЛ САЙТА Автор темы Thanks, Jeff | Sep 21, 2011 |
Excellent stuff, thanks Jeff! Jared | | |
Neil Coffey Великобритания Local time: 15:09 французский => английский + ...
As well as the factors Jeff mentions, another factor is simply what parallel corpora for training are available in the language pair in question (for statistical systems, but in reality, current systems-- and notably Google Translate-- tend to be of the statistical type). | | |