Standard Microsoft Terminology... TBX
Standard Microsoft Terminology... TBX
Since I just had to try this new Don's .lng thing immediately, I stumbled yesterday upon this Microsoft page:
http://www.microsoft.com/Language/en-US ... ology.aspx
As far as I understand, one can download a kind of dictionary of the standard MS terminology for a chosen target language. The format is .TBX, of which I have never heard before - I believe that this is the appropriate Wikipedia page about it:
https://en.wikipedia.org/wiki/Translation_memory#TBX
Does anyone have any experience with this? Does anyone know any appropriate tool for using it (portable, free, ...)?
Regards,
Borut
http://www.microsoft.com/Language/en-US ... ology.aspx
As far as I understand, one can download a kind of dictionary of the standard MS terminology for a chosen target language. The format is .TBX, of which I have never heard before - I believe that this is the appropriate Wikipedia page about it:
https://en.wikipedia.org/wiki/Translation_memory#TBX
Does anyone have any experience with this? Does anyone know any appropriate tool for using it (portable, free, ...)?
Regards,
Borut
Win 10 Pro 64bit
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: Standard Microsoft Terminology... TBX
Interesting, never heard about this. Yes, needs a tool...
FAQ | XY News RSS | XY Twitter
Re: Standard Microsoft Terminology... TBX
Is just an XML file, nothing impossible to reparse in an easier format...
Tag Backup - SimpleUpdater - XYplorer Messenger - The Unofficial XYplorer Archive - Everything in XYplorer
Don sees all [cit. from viewtopic.php?p=124094#p124094]
Don sees all [cit. from viewtopic.php?p=124094#p124094]
Re: Standard Microsoft Terminology... TBX
Yep, but in case you want to have a closer look at this, check out the following website. http://www.omegat.org/en/who_we_are.htmladmin wrote:Interesting, never heard about this. Yes, needs a tool...
Though the main advantage of a TMM - reducing the number of translations of equal or similar text sequences - might not help you much, since I assume you already identified the numerous occurences of your sequences in the code.
It might be a different story for help files, manuals, webpages aso.
(I was involved in the translation of user manuals for machines in a former job and had a closer look at TMM's back then, namely Trados(R) and others I don't recall right now.
It was just not affordable back then for a smaller company to go that way. Or better put, the time wasted whitout an appropriate tool never showed up in the figures as the investment in a tool would have...)
Ralph
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)
(OS: W11 22H2 Home x64 - XY: Current beta - Office 2019 32-bit - Display: 1920x1080 @ 125%)
-
- Posts: 855
- Joined: 05 Oct 2011 13:15
Re: Standard Microsoft Terminology... TBX
Hi'
Just a few quick notes.
My conclusion before the rest:
I fear 'regular' translators aren't used to work with double entries (e.g. "Yes--- Yes"), they cause more problems than they solve, so it might be a good question to ask around...
(Well, don't take my word for it, unfortunately I won't be able to dedicate myself to one trans, but I thought it might be important to voice it before it's too late).
The rest:
• After giving it a quick look, the TBX from Microsoft seems to be very valuable as it is,
- Provided you're used to work with a Computer Assisted Translation tool (CAT).
- As a Translation Memory, it stores pairs of terms (or segments,propositions...) between en-US and another language. And lots of XY (or any software) references are included.
the CAT approach with this TBX (and in general) is definitely worth it! (read: enormous time-saver)
• I second RalphM: OmegaT is one of the only free CAT, and it can use MS TBX (OmegaT+ can't).
- If you want to give it a try, I jot down a few guidelines below.
• However -depending on your needs- OmegaT's interface... might not be your cup of tea. So don't base your judgment of CATs on it.
• Though... if other CAT tools might 'look' much better, most of them (especially the most well-knowns) have outdated counterintuitive designs only on par with their insane price tag. Ever heard of XYplorer? Well, make a 180deg, they're somewhere on the horizon... (another of those 'we existed before so we rulz' niche-markets )
• Anyway, if 'regular' translators come to work on XY, they'll probably use a CAT (the one they're forced to deal with in their work) because however slow and cumbersome they are, they're much better and incredibly faster than nothing...
---
For those who want to try OmegaT with XY's Reference.lng here are a few quick'n'dirty tips:
- the "Reference.lng" isn't correctly parsed as UTF-16, so it must be saved e.g. as UTF-8 before.
- After creating a project and choosing the destination language, the TBX must be placed inside the project "Glossary" subfolder, and the "Reference.lng" inside the "source" subfolder.
- if you manually make changes to the Reference file, you don't have to close OmT, just click Menu "Project | Reload"
- a 2click on a segment give translation(s) and details in the Glossary pane, then
- a Right-Click on a glossary translation replace the selected entry
- subsequently, any word previously translated is 'ready' to use (CTRL+R) in the upper 'Fuzzy Matches' pane.
---
Well, I hope this makes sense and it might be of some help...
And I'm willing to be corrected about the 'doubling issue', or the despairing state of CATs (please tell me there's a new worthy contender).
Best of luck with this task,
Flux
Just a few quick notes.
My conclusion before the rest:
I fear 'regular' translators aren't used to work with double entries (e.g. "Yes--- Yes"), they cause more problems than they solve, so it might be a good question to ask around...
(Well, don't take my word for it, unfortunately I won't be able to dedicate myself to one trans, but I thought it might be important to voice it before it's too late).
The rest:
• After giving it a quick look, the TBX from Microsoft seems to be very valuable as it is,
- Provided you're used to work with a Computer Assisted Translation tool (CAT).
- As a Translation Memory, it stores pairs of terms (or segments,propositions...) between en-US and another language. And lots of XY (or any software) references are included.
the CAT approach with this TBX (and in general) is definitely worth it! (read: enormous time-saver)
• I second RalphM: OmegaT is one of the only free CAT, and it can use MS TBX (OmegaT+ can't).
- If you want to give it a try, I jot down a few guidelines below.
• However -depending on your needs- OmegaT's interface... might not be your cup of tea. So don't base your judgment of CATs on it.
• Though... if other CAT tools might 'look' much better, most of them (especially the most well-knowns) have outdated counterintuitive designs only on par with their insane price tag. Ever heard of XYplorer? Well, make a 180deg, they're somewhere on the horizon... (another of those 'we existed before so we rulz' niche-markets )
• Anyway, if 'regular' translators come to work on XY, they'll probably use a CAT (the one they're forced to deal with in their work) because however slow and cumbersome they are, they're much better and incredibly faster than nothing...
---
For those who want to try OmegaT with XY's Reference.lng here are a few quick'n'dirty tips:
- the "Reference.lng" isn't correctly parsed as UTF-16, so it must be saved e.g. as UTF-8 before.
- After creating a project and choosing the destination language, the TBX must be placed inside the project "Glossary" subfolder, and the "Reference.lng" inside the "source" subfolder.
- if you manually make changes to the Reference file, you don't have to close OmT, just click Menu "Project | Reload"
- a 2click on a segment give translation(s) and details in the Glossary pane, then
- a Right-Click on a glossary translation replace the selected entry
- subsequently, any word previously translated is 'ready' to use (CTRL+R) in the upper 'Fuzzy Matches' pane.
---
Well, I hope this makes sense and it might be of some help...
And I'm willing to be corrected about the 'doubling issue', or the despairing state of CATs (please tell me there's a new worthy contender).
Best of luck with this task,
Flux
• Scripts: Session Manager | SlideShow | Collection Manager | Power Launcher | Akelpad syntax highlighting | ...
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: Standard Microsoft Terminology... TBX
Why? I see only advantages:FluxTorpedoe wrote:I fear 'regular' translators aren't used to work with double entries (e.g. "Yes--- Yes"), they cause more problems than they solve, so it might be a good question to ask around...
- The translator always sees the original, and can easily search the text for earlier occurences of similar terms.
- The user can check the translation (if he can).
- The developer can check the translation (if he can).
- It helps semi-automatic translation.
- It's a safety belt: You know that XY is developing fast. Very quickly a translation will get out of synch. The double entry system provides the means to check an entry before translating it. I see no other way to do this.
- On upgrading XYplorer can automatically create an annotated *.LNG file showing a translator which items have been added, changed, or removed.
FAQ | XY News RSS | XY Twitter
-
- Posts: 855
- Joined: 05 Oct 2011 13:15
Re: Standard Microsoft Terminology... TBX
Hi'
Well, I totally agree with everything you mention!
I was only pointing out as an FYI that this approach is valid for 'manual' translators, but that AFAIK all 'regular' translators (who all use CAT tools) always work with single entities in their source documents - simply because that's the way all documents are in general. So this is how all CATs are working, they do the 'doubling' themselves (displaying Source and Destination lines by pair automatically); this is partly how they can make a trans 10++ times faster; and this is why this kind of Source doubling would more than double the time to translate (again, I can stand corrected).
Anyway, all things considered your approach is probably the smartest because:
1. there may not be many pro CAT translators working on XY lng,
2. if/when needed, a simple regex would do the trick: remove the doubling before translating, then add it back after (if missing). Heck, even better: if this issue ever arises, we could make a nice'n safe dedicated snippet!
Edit: I see that you're on your way to make your own "XY-CAT"! When you can't join them, beat them!
Have a nice day!
Flux
Well, I totally agree with everything you mention!
I was only pointing out as an FYI that this approach is valid for 'manual' translators, but that AFAIK all 'regular' translators (who all use CAT tools) always work with single entities in their source documents - simply because that's the way all documents are in general. So this is how all CATs are working, they do the 'doubling' themselves (displaying Source and Destination lines by pair automatically); this is partly how they can make a trans 10++ times faster; and this is why this kind of Source doubling would more than double the time to translate (again, I can stand corrected).
Anyway, all things considered your approach is probably the smartest because:
1. there may not be many pro CAT translators working on XY lng,
2. if/when needed, a simple regex would do the trick: remove the doubling before translating, then add it back after (if missing). Heck, even better: if this issue ever arises, we could make a nice'n safe dedicated snippet!
Edit: I see that you're on your way to make your own "XY-CAT"! When you can't join them, beat them!
Have a nice day!
Flux
• Scripts: Session Manager | SlideShow | Collection Manager | Power Launcher | Akelpad syntax highlighting | ...
Re: Standard Microsoft Terminology... TBX
@Flux: Thanks, this was of some help. I have tried OmegaT and than abandoned it, mostly because I was not able to redefine segmentation to a single line in a reasonable amount of time - shame on me.FluxTorpedoe wrote:For those who want to try OmegaT with XY's Reference.lng here are a few quick'n'dirty tips:
- the "Reference.lng" isn't correctly parsed as UTF-16, so it must be saved e.g. as UTF-8 before.
- After creating a project and choosing the destination language, the TBX must be placed inside the project "Glossary" subfolder, and the "Reference.lng" inside the "source" subfolder.
- if you manually make changes to the Reference file, you don't have to close OmT, just click Menu "Project | Reload"
- a 2click on a segment give translation(s) and details in the Glossary pane, then
- a Right-Click on a glossary translation replace the selected entry
- subsequently, any word previously translated is 'ready' to use (CTRL+R) in the upper 'Fuzzy Matches' pane.
---
Well, I hope this makes sense and it might be of some help...
@Don and Flux:
However, based on some looking into Croatian example, I find this Microsoft's .TBX a good thing in itself. Also, I find its Style Guides (available form the same page, I think - also on the target language bases) most valuable too.
I have noticed a discussion in German thread, about where to draw the line between keeping the original English terms and using national translations. This is, I think, an ubiquitous problem, popping up for every target language. As I am still playing with a translation, I started keeping my own partial list of most important translation term pairs.
Out of all that I now have TWO WISHES FOR ITT ENHANCEMENT. Warning: These are *big* wishes!
1. Ability to parse a .TBX (=XML) into a list of source-target pairs. Presentation of the parts of this list in the same ingenious automatic way, as are now presented suggestions based on already translated strings. In that way two suggestion lists would be active at the same time.
2. A "Notes area" in .lng file format, which would be supported through a "notes pad" in the ITT - so one could easily keep his/her notes while translating and have them always handy (in my case some translation term pairs). Actually, this could also be a third - translator defined - term pairs list.
Yup, I am ashamed, because these are really big wishes
Last edited by Borut on 17 Nov 2012 13:10, edited 1 time in total.
Win 10 Pro 64bit
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: Standard Microsoft Terminology... TBX
Forget it.1. Ability to parse a .TBX (=XML) into a list of source-target pairs...
Good idea. But must it be in the LNG file? Why not simply an ITT.txt file the read and written by ITT?2. A "Notes area" in .lng file format, which would be supported through a "notes pad" in the ITT - so one could easily keep his/her notes while translating and have them always handy (in my case some translation term pairs). Actually, this could also be a third - translator defined - term pairs list.
FAQ | XY News RSS | XY Twitter
Re: Standard Microsoft Terminology... TBX
I expected it. Well, das Leben ist hart und grausam.admin wrote:Forget it.1. Ability to parse a .TBX (=XML) into a list of source-target pairs...
Yes, that would also be an option. However, my reasoning behind proposing it as a part of the .lng was: A translation will be - let us face it - a never ending story. It is to be expected that several translators will keep a translation up to date. Also, decision about some terms being or not being translated is sometimes even a political one, not only a linguistic one. When such translation pares - as notes - along with, for instance, links to supportive literature sources or the like would be a part of the .lng file, then they are readily available to who ever continues to update an already existing translation.admin wrote:Good idea. But must it be in the LNG file? Why not simply an ITT.txt file the read and written by ITT?2. A "Notes area" in .lng file format, which would be supported through a "notes pad" in the ITT - so one could easily keep his/her notes while translating and have them always handy (in my case some translation term pairs). Actually, this could also be a third - translator defined - term pairs list.
Last edited by Borut on 17 Nov 2012 13:29, edited 1 time in total.
Win 10 Pro 64bit
-
- Site Admin
- Posts: 60357
- Joined: 22 May 2004 16:48
- Location: Win8.1 @100%, Win10 @100%
- Contact:
Re: Standard Microsoft Terminology... TBX
I need to think about this...
FAQ | XY News RSS | XY Twitter
-
- Posts: 855
- Joined: 05 Oct 2011 13:15
Re: Standard Microsoft Terminology... TBX
####
EDIT: Latest version and ready-made TBXY files available: here.
####
Hi'
You'll tell me if I was inspired to go online today... but that:
So here goes:
- a script that outputs a clean list of translating pairs from a (MS) TBX.
# OBSOLETE #
Good Luck!
Hope this helps,
Flux
EDIT: Latest version and ready-made TBXY files available: here.
####
Hi'
You'll tell me if I was inspired to go online today... but that:
...sounded like a (potentially useful) challenge!Borut wrote:1. Ability to parse a .TBX (=XML) into a list of source-target pairs.
So here goes:
- a script that outputs a clean list of translating pairs from a (MS) TBX.
# OBSOLETE #
Code: Select all
//
// TBX-XY Cleaner v1.1
//
// - Removes all non-translation data.
// - Reformats with one entry per line,
// with Source and Target languages separated by the following delimiter.
$Delimiter = "|--|"; // Change to suit your needs
if (<curext> == "tbx") {
status "~~~ Processing ~~~", , "progress";
$TBX_file = <curbase>;
$TBX_content = readfile(<curitem>);
$TBXY_content = regexreplace($TBX_content, '((?:.|\n)(?!=<term id=))+?<langset(?:.|\n)*?<term id="\d+">([^<]+)(?:.|\n)*?<langset(?:.|\n)*?<term id="\d+">([^<]+)', "$2§§§$3<crlf>");
$TBXY_content = regexreplace($TBXY_content, "\n<[^\n]+", ""); // Remove last line
$TBXY_content = regexreplace($TBXY_content, "§§§", $Delimiter); // Note: regex is used because regular replace hangs
writefile ($TBX_file."-XY.tbx", $TBXY_content);
wait 400;
status "TBX converted successfully";
} else {
msg "Select a TBX file first"
}
Hope this helps,
Flux
Last edited by FluxTorpedoe on 18 Nov 2012 16:09, edited 3 times in total.
• Scripts: Session Manager | SlideShow | Collection Manager | Power Launcher | Akelpad syntax highlighting | ...
TBX Cleaner script + MicrosoftTermCollection-hr file as a re
Ehmmm, Flux, what shall I say? Megacool! Thank you very much. Now it is even easier for me to search through the resulting file in editor. I hope that other translators will spot this script and make their life easier. But, let me use the opportunity...FluxTorpedoe wrote:- a script that outputs a clean list of translating pairs from a (MS) TBX.
Hope this helps,
Flux
Hey Don, hope it is Monday and you have enjoyed the rest of your weekend to the fullest.
Now, here is an example of an UTF8 w/o BOM file containing MS endorsed official translation pares. No XML parsing needed any more!
What about making some provision for loading a file of this or a similar format into an additional intelligent suggestions list in the ITT window?
Thanks for considering,
Borut
Edit: There is apparently still a small problem here, since it appears that the pares are in some intervals loosing synchronicity, but this presents no problem while searching in editor.
Edit 2, on 2012-11-18: Thanks to FluxTorpedoe's script version 1.1, I could create an error free version. I also replaced the file here.
- Attachments
-
- MicrosoftTermCollection-hr_XYstyle_v20121118.zip
- MS endorsed translation pares for Croatian - UTF8 w/o BOM format with |--| as a string delimiter. Last modified on 2012-11-18.
- (106.88 KiB) Downloaded 255 times
Last edited by Borut on 18 Nov 2012 10:55, edited 1 time in total.
Win 10 Pro 64bit
-
- Posts: 855
- Joined: 05 Oct 2011 13:15
Re: Standard Microsoft Terminology... TBX
Hi'
Glad if I can make my small contribution!
I checked the problem you mentioned and identified the two causes.
FYI, in these MS' TBX files:
- TBX (in a few cases) can have 2 translations for 1 entry
- TBX hardly use any linebreak, but do it with a logic unknown to me
e.g. I saw one line with 9million (yep!) characters, but another one with ~6k chars only...
So things should be fixed now. I quickly browsed though the ~21 000 pairs/entries and they seemed synchronized!
For clarity sake, I updated the code of my original post, so grab it there.
Best of luck,
Flux
Glad if I can make my small contribution!
I checked the problem you mentioned and identified the two causes.
FYI, in these MS' TBX files:
- TBX (in a few cases) can have 2 translations for 1 entry
- TBX hardly use any linebreak, but do it with a logic unknown to me
e.g. I saw one line with 9million (yep!) characters, but another one with ~6k chars only...
So things should be fixed now. I quickly browsed though the ~21 000 pairs/entries and they seemed synchronized!
For clarity sake, I updated the code of my original post, so grab it there.
Best of luck,
Flux
• Scripts: Session Manager | SlideShow | Collection Manager | Power Launcher | Akelpad syntax highlighting | ...
Re: Standard Microsoft Terminology... TBX
Thank you again, Flux! Now it seems perfect. Maybe Don will be able to use it too.
Yes, I was previously also not aware of that. I have seen it after BeyondComparing my previous result with the current result (after using your script v1.1).FluxTorpedoe wrote:I checked the problem you mentioned and identified the two causes.
FYI, in these MS' TBX files:
- TBX (in a few cases) can have 2 translations for 1 entry
Grabbed it; used it; was happy; replaced the resulting file in my previous post; am happy; am translating further (althoug I actually do not know why, since - as I have written already somewhere else - I am convinced that no one in the Croatian speaking area will use the translated version).FluxTorpedoe wrote:So things should be fixed now. I quickly browsed though the ~21 000 pairs/entries and they seemed synchronized! For clarity sake, I updated the code of my original post, so grab it there.
Win 10 Pro 64bit