Brief survey on machine translation use, input appreciated (Machine Translation (MT))

Технические форумы » Machine Translation (MT) »
Brief survey on machine translation use, input appreciated
Track this topic

Brief survey on machine translation use, input appreciated

Автор темы: Jared Tabor

Jared Tabor
Local time: 11:09
ПЕРСОНАЛ САЙТА

Sep 20, 2011

Hello all,

I would like to invite you to provide your input through a brief survey on the use of machine translation:

http://www.proz.com/phpQ/fillsurvey.php?sid=563

The results of this and other surveys will be shared at the upcoming virtual events ser... See more

Katalin Horváth McClure

США
Local time: 10:09
Член ProZ.com c 2002
английский => венгерский
+ ...

You should ask for the language pairs

Sep 21, 2011

Hi Jared,
In the survey, if a person answers Yes to using MT, I think you should ask the language pairs, as it is an important piece of info. The usefulness of currently available MT systems greatly depend on the language pair they are being used for.
Without including this info, I think the survey results may be hard to interpret correctly.
Another piece of info you may want to ask whether the answerer is a freelancer or an agency.
Katalin

[Edited at 2011-09-21 17... See more

Jared Tabor
Local time: 11:09
ПЕРСОНАЛ САЙТА

Автор темы

Thanks Katalin

Sep 21, 2011

Hi Katalin,

Thanks for the feedback, I agree that the language pair is an important factor in how useful MT will be for some tasks.

Jared

Jeff Allen

Франция
Local time: 16:09
несколько языков
+ ...

language directions for MT

Sep 21, 2011

It is actually language direction and not just language pair where the level of MT is impacted. A few examples which enter into the equation of quality.

1) Market need:

MT systems and software which aim at MT for translation publication (outward bound translation) -- basically what most professional translators produce translations for -- will usually be better in language directions FROM SOURCE language English to target languages. This is because a significant amount of content is written in English. English to FIGS (French, Italian, German, Spanish) + RU directions tend to be the strongest.

MT systems where content gisting (known as inward bound translation, and also under as requests of draft translations for understanding-only needs) for languages in general (including less widely used languages) will be better from other languages INTO English. These tend to be for intelligence community content filtering and gisting. These language directions usually include FIGS, Asian languages, Russian, & Middle Eastern languages (+ any language which is on the radar for the intelligence community) INTO English.

2) MT system type:

- Rule and dictionary based MT systems (RBMT) have been traditionally been developed per language "direction" based on the needs mentioned above. These depend on the availability of grammar rules to enter into the system for both source and target language and the creation of a built-in bilingual glossary/dictionary.

It is usually the less commonly used language which are developed the least for such MT systems, because it requires a bit of upfront development (usually about 2 years as a starter) to get these systems to maturity, so it requires a clearly defined business need and accompanying budget.

- Statistical MT systems usually try to start out as language independent from the technical system side (because they are looking at combinations of characters and other forms) and yet really mainly depend on the availability of significant "quantities" of source and target language "content". These are usually abundant for FIGES, Russian and Asian languages. Much less available for any other language that does not fit in the vague group of "major world languages"

Hybrid MT systems are now trying to address these concerns.

3) Maturity of the language direction + maturity of the language resource content:

- rule-based MT language directions that have been around for 30+ years (such as EN-FR and EN-FR) will in general provide better quality output than brand new language direction projects (such as EN-Swedish) until the new projects stabilize over time.

- Statistical MT systems are dependent on both the quality of the texts that are used to train the system and the quantity of texts that are available.

4) language typologies and language complexity

When a language direction is developed on languages with the same general language typology (language structure similarity), there is a general tendency to produce better MT quality. This can be seen in that EN FR and EN ES are better than EN DE.
Language directions introducing significantly different typologies such as EN and JP are a next level of challenge.
This is why "Knowledge-based MT systems (KBMT)", based on underlying semantic analysis, have produced better quality for such language directions than the rule based or even statistical based system. The downside is that KBMT system are very time-consuming (and costly) to create.

5) MT language pivoting

Any language that falls "outside" of being paired with English (or possibly French in the case of Systran) -- as in language X English -- risks using English as a pivot language to provide the language direction (for example: Swedish > English > Norwegian, Korean > English > Japanese).
This is not very different from Language Service Providers/translation agencies who run into the same problem on major projects with multiple language pairs when theu cannot find Professional translators who can do some or all language directions directly, so they sometimes pass through English as a pivot language. We all know that this requires more time, much more quality care, and more money. Using a pivot language for MT suffers the same problem and is even more risky.

5) Comparative timeline
A number of years back we conducted a comparative timeline study of developing both MT and speech systems for several language pairs and presented it in:
Rapid-Deployment-Speech-tech-ICSLP1998-Lenzo-Allen-Hogan.pdf
https://www.box.net/shared/n2bxmmeubr

Many of the issues we described in that presentation/article are still valid today in the cases of points 1-4 above.

Thus language pairs (and I would even say language directions) are quite important when looking at evaluating MT quality.

Jeff ▲ Collapse

Jared Tabor
Local time: 11:09
ПЕРСОНАЛ САЙТА

Автор темы

Thanks, Jeff

Sep 21, 2011

Excellent stuff, thanks Jeff!

Jared

Neil Coffey

Великобритания
Local time: 15:09
французский => английский
+ ...

Also..

Sep 22, 2011

As well as the factors Jeff mentions, another factor is simply what parallel corpora for training are available in the language pair in question (for statistical systems, but in reality, current systems-- and notably Google Translate-- tend to be of the statistical type).

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Модератор(ы) этого форума
Mahmoud Akbari	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

Brief survey on machine translation use, input appreciated

Forum rules

Help and orientation

CafeTran Espresso
You've never met a CAT tool this clever! Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free Buy now! »

Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business. More info »


	X Sign in to your ProZ.com account... Username: Password: Forgot your password? Or create a new account