Cross-compatible terminology database format
Tópico cartaz: TranslateMedia Translation Company
TranslateMedia Translation Company
TranslateMedia Translation Company
Reino Unido
inglês
+ ...
Jul 23, 2009

Hi Everyone,

Our agency wants to store and provide terminology databases in a format that can be imported into any CAT tool - so that people can work in the tool of their choice.

It is proving difficult to work out what the most universal format is - I thought it would be CSV, but one of our translation team suggested that TXT might be better. We are hoping that TXT or CSV will work, but understand that we may be wrong!

Does anyone know if there is a univer
... See more
Hi Everyone,

Our agency wants to store and provide terminology databases in a format that can be imported into any CAT tool - so that people can work in the tool of their choice.

It is proving difficult to work out what the most universal format is - I thought it would be CSV, but one of our translation team suggested that TXT might be better. We are hoping that TXT or CSV will work, but understand that we may be wrong!

Does anyone know if there is a universal format that can be imported into all terminology tools in all CAT tools?

Thanks!

Matt
Collapse


 
Laurent KRAULAND (X)
Laurent KRAULAND (X)  Identity Verified
França
Local time: 16:24
francês para alemão
+ ...
How about TBX or open-source formats? Jul 23, 2009

Hello Matt,
you could look have a look at this:
http://www.lisa.org/Term-Base-eXchange.32.0.html and at that: http://en.wikipedia.org/wiki/Tbx#TBX

HTH
Laurent K.

[Edited at 2009-07-23 11:50 GMT]


 
TranslateMedia Translation Company
TranslateMedia Translation Company
Reino Unido
inglês
+ ...
CRIADOR(A) DO TÓPICO
Does export to TBX exist in all CAT tools? Jul 23, 2009

Hi Laurent,

Thanks for your message, really helpful.

When i look in our tool here - MemoQ (v.3.5.22) I cannot see export to TBX format - I only have the option to export to CSV or Multiterm XML format. MemoQ also only appears to allow import of TermBases in CSV or TMX format....just wondering what the most common tools will import/export?

Thanks for your help!

Matt


 
Laurent KRAULAND (X)
Laurent KRAULAND (X)  Identity Verified
França
Local time: 16:24
francês para alemão
+ ...
Most commonly imported/exported format Jul 23, 2009

Matt Train wrote:
just wondering what the most common tools will import/export?

Thanks for your help!

Matt


Hi again, Matt, glad I could help in some way. AFAIAK the most commonly imported/exported format is TMX (Translation Memory eXchange), but it is not a terminology database format. Hope that other colleagues can add their input to mine.

Laurent K.


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 16:24
inglês para húngaro
+ ...
CSV and TXT are basically the same thing Jul 23, 2009

Matt Train wrote:

Hi Everyone,

Our agency wants to store and provide terminology databases in a format that can be imported into any CAT tool - so that people can work in the tool of their choice.

It is proving difficult to work out what the most universal format is - I thought it would be CSV, but one of our translation team suggested that TXT might be better. We are hoping that TXT or CSV will work, but understand that we may be wrong!

Does anyone know if there is a universal format that can be imported into all terminology tools in all CAT tools?

Thanks!

Matt


A CSV is a comma separated txt file. Now, I have no idea how CSV could possibly work as terminology data often contains commas of its own, which would screw it all up horribly. I'm sure there is a solution for that issue, but why bother when you can use tab separated? Comma separated and tab separated TXTs are almost the same thing but tab separated is a bit more user friendly I think. For starters, you can copy-paste between a tab separated txt and a spreadsheet with zero adjustment or trickery, they morph into each other by default.

So, the only really good solution that I can see is tab separated txt and/or Excel tables. (Txt wins in compatibility but xls is more familiar and more easily manageable to most users.)


 
Samuel Murray
Samuel Murray  Identity Verified
Holanda
Local time: 16:24
Membro (2006)
inglês para africâner
+ ...
Any flat file is best Jul 23, 2009

Matt Train wrote:
It is proving difficult to work out what the most universal format is - I thought it would be CSV, but one of our translation team suggested that TXT might be better.


Some CAT tools simply don't have the facility to import a simple format. There is no format that every tool can import. But your best bet is probably something tab delimited. If the translation team member meant "Trados TXT", then he's got it wrong -- the only tool that can read Trados TXT is, well, Trados. But if he meant a tab delimited file with a TXT file extension, then it's spot on.

TBX was designed (by some guy and his mates, over a cup of coffee perhaps) as a universal format, but so far very few tools can read and/or write it.

TMX may work but the problem with TMX is that there are only two fields, whereas with a tab delimited TXT file or a CSV file you can have as many fields as you can dream of.

CSV is not a good choice because different tools generate different dialects of CSV that are not all mutually intelligible.

[Edited at 2009-07-23 12:20 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Holanda
Local time: 16:24
Membro (2006)
inglês para africâner
+ ...
How to handle commas in CSV Jul 23, 2009

FarkasAndras wrote:
A CSV is a comma separated txt file. Now, I have no idea how CSV could possibly work as terminology data often contains commas of its own, which would screw it all up horribly.


Ah, but CSV is not simply a comma separated file -- it is a comma separated file with some extras to make it comma compatible. If a field contains a comma, simply put quotes on either side of the field. If a field contains a quote, simply double it. With CSV, your fields can also contain tabs and even line breaks. Some CSV programs do not accept a CSV file if there are superfluous quotes, however, and some other programs generate quotes whether they are strictly necessary or not, so you have a recipe for disaster.

That said, I don't think the CSV format is sufficiently simple that people should attempt to edit it by hand. Tab delimited is simpler and more human editable.

For starters, you can copy-paste between a tab separated txt and a spreadsheet with zero adjustment or trickery...


Agreed. You can do slightly more with Microsoft Office than with OpenOffice.org, but basically a tab delimited file shows the most promise.


[Edited at 2009-07-23 12:29 GMT]


 
TranslateMedia Translation Company
TranslateMedia Translation Company
Reino Unido
inglês
+ ...
CRIADOR(A) DO TÓPICO
Thanks! Jul 23, 2009

Thanks Samuel and Andras.

Interesting that a standard simple format does not exist in the practical world (although TBX may solve that in future hopefully!).

Now we know that in this case there is not a one-size-fits-all solution we can act accordingly.

Thanks for your input!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Cross-compatible terminology database format







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »