Standard Microsoft Terminology... TBX

Where developers, translators, and users meet...
Borut
Posts: 1412
Joined: 19 Oct 2010 19:29

Standard Microsoft Terminology... TBX

Post by Borut »

Since I just had to try this new Don's .lng thing immediately, I stumbled yesterday upon this Microsoft page:

http://www.microsoft.com/Language/en-US ... ology.aspx

As far as I understand, one can download a kind of dictionary of the standard MS terminology for a chosen target language. The format is .TBX, of which I have never heard before - I believe that this is the appropriate Wikipedia page about it:

https://en.wikipedia.org/wiki/Translation_memory#TBX

Does anyone have any experience with this? Does anyone know any appropriate tool for using it (portable, free, ...)?

Regards,
Borut
Win 10 Pro 64bit

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Standard Microsoft Terminology... TBX

Post by admin »

Interesting, never heard about this. Yes, needs a tool...

Marco
Posts: 2347
Joined: 27 Jun 2011 15:20

Re: Standard Microsoft Terminology... TBX

Post by Marco »

Is just an XML file, nothing impossible to reparse in an easier format...
Tag Backup - SimpleUpdater - XYplorer Messenger - The Unofficial XYplorer Archive - Everything in XYplorer
Don sees all [cit. from viewtopic.php?p=124094#p124094]

RalphM
Posts: 1932
Joined: 27 Jan 2005 23:38
Location: Cairns, Australia

Re: Standard Microsoft Terminology... TBX

Post by RalphM »

admin wrote:Interesting, never heard about this. Yes, needs a tool...
Yep, but in case you want to have a closer look at this, check out the following website. http://www.omegat.org/en/who_we_are.html
Though the main advantage of a TMM - reducing the number of translations of equal or similar text sequences - might not help you much, since I assume you already identified the numerous occurences of your sequences in the code.

It might be a different story for help files, manuals, webpages aso.

(I was involved in the translation of user manuals for machines in a former job and had a closer look at TMM's back then, namely Trados(R) and others I don't recall right now.
It was just not affordable back then for a smaller company to go that way. Or better put, the time wasted whitout an appropriate tool never showed up in the figures as the investment in a tool would have...)
Ralph :)
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)

FluxTorpedoe
Posts: 855
Joined: 05 Oct 2011 13:15

Re: Standard Microsoft Terminology... TBX

Post by FluxTorpedoe »

Hi'

Just a few quick notes.

My conclusion before the rest:
I fear 'regular' translators aren't used to work with double entries (e.g. "Yes--- Yes"), they cause more problems than they solve, so it might be a good question to ask around...
(Well, don't take my word for it, unfortunately I won't be able to dedicate myself to one trans, but I thought it might be important to voice it before it's too late).

The rest:
• After giving it a quick look, the TBX from Microsoft seems to be very valuable as it is,
- Provided you're used to work with a Computer Assisted Translation tool (CAT).
- As a Translation Memory, it stores pairs of terms (or segments,propositions...) between en-US and another language. And lots of XY (or any software) references are included.
:arrow: the CAT approach with this TBX (and in general) is definitely worth it! (read: enormous time-saver)
• I second RalphM: OmegaT is one of the only free CAT, and it can use MS TBX (OmegaT+ can't).
- If you want to give it a try, I jot down a few guidelines below.
• However -depending on your needs- OmegaT's interface... might not be your cup of tea. So don't base your judgment of CATs on it.
• Though... if other CAT tools might 'look' much better, most of them (especially the most well-knowns) have outdated counterintuitive designs only on par with their insane price tag. Ever heard of XYplorer? :twisted: Well, make a 180deg, they're somewhere on the horizon... (another of those 'we existed before so we rulz' niche-markets :evil: )
• Anyway, if 'regular' translators come to work on XY, they'll probably use a CAT (the one they're forced to deal with in their work) because however slow and cumbersome they are, they're much better and incredibly faster than nothing...

---
For those who want to try OmegaT with XY's Reference.lng here are a few quick'n'dirty tips:
- the "Reference.lng" isn't correctly parsed as UTF-16, so it must be saved e.g. as UTF-8 before.
- After creating a project and choosing the destination language, the TBX must be placed inside the project "Glossary" subfolder, and the "Reference.lng" inside the "source" subfolder.
- if you manually make changes to the Reference file, you don't have to close OmT, just click Menu "Project | Reload"
- a 2click on a segment give translation(s) and details in the Glossary pane, then
- a Right-Click on a glossary translation replace the selected entry
- subsequently, any word previously translated is 'ready' to use (CTRL+R) in the upper 'Fuzzy Matches' pane.
---

Well, I hope this makes sense and it might be of some help...
And I'm willing to be corrected about the 'doubling issue', or the despairing state of CATs (please tell me there's a new worthy contender).

Best of luck with this task, 8)
Flux

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Standard Microsoft Terminology... TBX

Post by admin »

FluxTorpedoe wrote:I fear 'regular' translators aren't used to work with double entries (e.g. "Yes--- Yes"), they cause more problems than they solve, so it might be a good question to ask around...
Why? I see only advantages:
- The translator always sees the original, and can easily search the text for earlier occurences of similar terms.
- The user can check the translation (if he can).
- The developer can check the translation (if he can).
- It helps semi-automatic translation.
- It's a safety belt: You know that XY is developing fast. Very quickly a translation will get out of synch. The double entry system provides the means to check an entry before translating it. I see no other way to do this.
- On upgrading XYplorer can automatically create an annotated *.LNG file showing a translator which items have been added, changed, or removed.

FluxTorpedoe
Posts: 855
Joined: 05 Oct 2011 13:15

Re: Standard Microsoft Terminology... TBX

Post by FluxTorpedoe »

Hi'

Well, I totally agree with everything you mention! :)

I was only pointing out as an FYI that this approach is valid for 'manual' translators, but that AFAIK all 'regular' translators (who all use CAT tools) always work with single entities in their source documents - simply because that's the way all documents are in general. So this is how all CATs are working, they do the 'doubling' themselves (displaying Source and Destination lines by pair automatically); this is partly how they can make a trans 10++ times faster; and this is why this kind of Source doubling would more than double the time to translate (again, I can stand corrected).

Anyway, all things considered your approach is probably the smartest because:
1. there may not be many pro CAT translators working on XY lng,
2. if/when needed, a simple regex would do the trick: remove the doubling before translating, then add it back after (if missing). Heck, even better: if this issue ever arises, we could make a nice'n safe dedicated snippet! ;)

Edit: I see that you're on your way to make your own "XY-CAT"! When you can't join them, beat them! :biggrin:

Have a nice day! 8)
Flux

Borut
Posts: 1412
Joined: 19 Oct 2010 19:29

Re: Standard Microsoft Terminology... TBX

Post by Borut »

FluxTorpedoe wrote:For those who want to try OmegaT with XY's Reference.lng here are a few quick'n'dirty tips:
- the "Reference.lng" isn't correctly parsed as UTF-16, so it must be saved e.g. as UTF-8 before.
- After creating a project and choosing the destination language, the TBX must be placed inside the project "Glossary" subfolder, and the "Reference.lng" inside the "source" subfolder.
- if you manually make changes to the Reference file, you don't have to close OmT, just click Menu "Project | Reload"
- a 2click on a segment give translation(s) and details in the Glossary pane, then
- a Right-Click on a glossary translation replace the selected entry
- subsequently, any word previously translated is 'ready' to use (CTRL+R) in the upper 'Fuzzy Matches' pane.
---

Well, I hope this makes sense and it might be of some help...
@Flux: Thanks, this was of some help. I have tried OmegaT and than abandoned it, mostly because I was not able to redefine segmentation to a single line in a reasonable amount of time - shame on me.

@Don and Flux:
However, based on some looking into Croatian example, I find this Microsoft's .TBX a good thing in itself. Also, I find its Style Guides (available form the same page, I think - also on the target language bases) most valuable too.

I have noticed a discussion in German thread, about where to draw the line between keeping the original English terms and using national translations. This is, I think, an ubiquitous problem, popping up for every target language. As I am still playing with a translation, I started keeping my own partial list of most important translation term pairs.

Out of all that I now have TWO WISHES FOR ITT ENHANCEMENT. Warning: These are *big* wishes!

1. Ability to parse a .TBX (=XML) into a list of source-target pairs. Presentation of the parts of this list in the same ingenious automatic way, as are now presented suggestions based on already translated strings. In that way two suggestion lists would be active at the same time.

2. A "Notes area" in .lng file format, which would be supported through a "notes pad" in the ITT - so one could easily keep his/her notes while translating and have them always handy (in my case some translation term pairs). Actually, this could also be a third - translator defined - term pairs list.

Yup, I am ashamed, because these are really big wishes :oops:
Last edited by Borut on 17 Nov 2012 13:10, edited 1 time in total.
Win 10 Pro 64bit

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Standard Microsoft Terminology... TBX

Post by admin »

1. Ability to parse a .TBX (=XML) into a list of source-target pairs...
Forget it. :)
2. A "Notes area" in .lng file format, which would be supported through a "notes pad" in the ITT - so one could easily keep his/her notes while translating and have them always handy (in my case some translation term pairs). Actually, this could also be a third - translator defined - term pairs list.
Good idea. But must it be in the LNG file? Why not simply an ITT.txt file the read and written by ITT?

Borut
Posts: 1412
Joined: 19 Oct 2010 19:29

Re: Standard Microsoft Terminology... TBX

Post by Borut »

admin wrote:
1. Ability to parse a .TBX (=XML) into a list of source-target pairs...
Forget it. :)
:) I expected it. Well, das Leben ist hart und grausam. :(
admin wrote:
2. A "Notes area" in .lng file format, which would be supported through a "notes pad" in the ITT - so one could easily keep his/her notes while translating and have them always handy (in my case some translation term pairs). Actually, this could also be a third - translator defined - term pairs list.
Good idea. But must it be in the LNG file? Why not simply an ITT.txt file the read and written by ITT?
Yes, that would also be an option. However, my reasoning behind proposing it as a part of the .lng was: A translation will be - let us face it - a never ending story. It is to be expected that several translators will keep a translation up to date. Also, decision about some terms being or not being translated is sometimes even a political one, not only a linguistic one. When such translation pares - as notes - along with, for instance, links to supportive literature sources or the like would be a part of the .lng file, then they are readily available to who ever continues to update an already existing translation.
Last edited by Borut on 17 Nov 2012 13:29, edited 1 time in total.
Win 10 Pro 64bit

admin
Site Admin
Posts: 60357
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: Standard Microsoft Terminology... TBX

Post by admin »

I need to think about this...

FluxTorpedoe
Posts: 855
Joined: 05 Oct 2011 13:15

Re: Standard Microsoft Terminology... TBX

Post by FluxTorpedoe »

####
EDIT: Latest version and ready-made TBXY files available: here.
####


Hi'
You'll tell me if I was inspired to go online today... but that:
Borut wrote:1. Ability to parse a .TBX (=XML) into a list of source-target pairs.
...sounded like a (potentially useful) challenge! 8)

So here goes:
- a script that outputs a clean list of translating pairs from a (MS) TBX.
# OBSOLETE #

Code: Select all

//
// TBX-XY Cleaner v1.1
//
// - Removes all non-translation data.
// - Reformats with one entry per line,
//    with Source and Target languages separated by the following delimiter.

  $Delimiter = "|--|"; // Change to suit your needs

  if (<curext> == "tbx") {
    status "~~~   Processing   ~~~", , "progress";
    $TBX_file = <curbase>;
    $TBX_content = readfile(<curitem>);
    $TBXY_content = regexreplace($TBX_content, '((?:.|\n)(?!=<term id=))+?<langset(?:.|\n)*?<term id="\d+">([^<]+)(?:.|\n)*?<langset(?:.|\n)*?<term id="\d+">([^<]+)', "$2§§§$3<crlf>");
    $TBXY_content = regexreplace($TBXY_content, "\n<[^\n]+", ""); // Remove last line
    $TBXY_content = regexreplace($TBXY_content, "§§§", $Delimiter); // Note: regex is used because regular replace hangs
    writefile ($TBX_file."-XY.tbx", $TBXY_content);
    wait 400;
    status "TBX converted successfully";
  } else {
    msg "Select a TBX file first"
  }
Good Luck!

Hope this helps, :D
Flux
Last edited by FluxTorpedoe on 18 Nov 2012 16:09, edited 3 times in total.

Borut
Posts: 1412
Joined: 19 Oct 2010 19:29

TBX Cleaner script + MicrosoftTermCollection-hr file as a re

Post by Borut »

FluxTorpedoe wrote:- a script that outputs a clean list of translating pairs from a (MS) TBX.
Hope this helps, :D
Flux
Ehmmm, Flux, what shall I say? Megacool! 8) Thank you very much. Now it is even easier for me to search through the resulting file in editor. I hope that other translators will spot this script and make their life easier. But, let me use the opportunity...

Hey Don, hope it is Monday and you have enjoyed the rest of your weekend to the fullest.

Now, here is an example of an UTF8 w/o BOM file containing MS endorsed official translation pares. No XML parsing needed any more!

What about making some provision for loading a file of this or a similar format into an additional intelligent suggestions list in the ITT window?

Thanks for considering,
Borut

Edit: There is apparently still a small problem here, since it appears that the pares are in some intervals loosing synchronicity, but this presents no problem while searching in editor.

Edit 2, on 2012-11-18: Thanks to FluxTorpedoe's script version 1.1, I could create an error free version. I also replaced the file here.
Attachments
MicrosoftTermCollection-hr_XYstyle_v20121118.zip
MS endorsed translation pares for Croatian - UTF8 w/o BOM format with |--| as a string delimiter. Last modified on 2012-11-18.
(106.88 KiB) Downloaded 255 times
Last edited by Borut on 18 Nov 2012 10:55, edited 1 time in total.
Win 10 Pro 64bit

FluxTorpedoe
Posts: 855
Joined: 05 Oct 2011 13:15

Re: Standard Microsoft Terminology... TBX

Post by FluxTorpedoe »

Hi'
Glad if I can make my small contribution! :wink:

I checked the problem you mentioned and identified the two causes.
FYI, in these MS' TBX files:
- TBX (in a few cases) can have 2 translations for 1 entry
- TBX hardly use any linebreak, but do it with a logic unknown to me
e.g. I saw one line with 9million (yep!) characters, but another one with ~6k chars only...

So things should be fixed now. I quickly browsed though the ~21 000 pairs/entries and they seemed synchronized!
For clarity sake, I updated the code of my original post, so grab it there. :D

Best of luck, 8)
Flux

Borut
Posts: 1412
Joined: 19 Oct 2010 19:29

Re: Standard Microsoft Terminology... TBX

Post by Borut »

Thank you again, Flux! Now it seems perfect. Maybe Don will be able to use it too.
FluxTorpedoe wrote:I checked the problem you mentioned and identified the two causes.
FYI, in these MS' TBX files:
- TBX (in a few cases) can have 2 translations for 1 entry
Yes, I was previously also not aware of that. I have seen it after BeyondComparing my previous result with the current result (after using your script v1.1).
FluxTorpedoe wrote:So things should be fixed now. I quickly browsed though the ~21 000 pairs/entries and they seemed synchronized! For clarity sake, I updated the code of my original post, so grab it there. :D
Grabbed it; used it; was happy; replaced the resulting file in my previous post; am happy; am translating further (althoug I actually do not know why, since - as I have written already somewhere else - I am convinced that no one in the Croatian speaking area will use the translated version).
Win 10 Pro 64bit

Post Reply