On the limitations of input files other than txt and tmx, see the Use chapter. You can review and correct the autoalignment in the xls before the tmx is generated.
Wordfast aligner pdf#
The aligner takes two or more doc, docx, rtf, odt, txt (in UTF-8!), tmx, pdf or HTML files as input and produces autoaligned tab-delimited UTF-8 txt files, xls spreadsheets and TMX files from them. For the time being, the GUI version is only available for Windows. For notes on running on linux, see Minor miscellaneous technical details. This readme was written mostly with Windows in mind, but things should work the same on other platforms.
Wordfast aligner windows 7#
It is developed on Windows 7 and tested on Win7, XP, Linux (Ubuntu) and occasionally OSX. If you are a developer working on improving LF Aligner or adapting it for some specific purpose, please drop me a line to TECHNICAL INFO ***
Wordfast aligner free#
You are free to distribute and modify the code, as long as any significant modifications or derived works are also made available under the GPL terms. It is free for personal use (use by freelance translators for their work is considered personal use).
Wordfast aligner license#
LF Aligner is distributed under the GNU General Public License version 3 or newer. if you want to get started quickly without reading the whole thing, you can do so by following the steps described in sample/howto.txt, but you should probably come back to this readme later, especially if you get stuck with something. I kept adding information and this readme ended up being pretty long. Just open aligner_setup.txt to see the main setup options. LF Aligner also gives you complete control over the whole process: in the TMX, you can set the date and time, language codes, creator ID, add notes to each segment etc., and you have extensive customisation options regarding a bunch of other features, too. Tab delimited txt files are always generated as well, suitable for use with Apsic Xbench or processing with other tools. The primary output is TMX, but if you don't use TMX-compatible software, the aligner can generate xls files for you. You can check the log to see if this dictionary data was used for your alignment.) (Reasonably good dictionary data is bundled with LF Aligner for more than 800 combinations of 32 languages. The accuracy of Hunalign's automatic pairings depends entirely on the quality of the source material (whether you have removed page headers and footers etc.) and whether it has a good dictionary to work with, but percentages in the high nineties are common. Most of the time you will get a very usable TM without human input. The upshot is that you don't have to manually pair up the segments, only review the pairings and do any necessary corrections - or not even that. It uses a smart algorithm to determine which sentence goes with which, relying on sentence length, a dictionary and, as near as I can tell, black magic, and it does a really good job. The reason why you may want to use this simple tool instead of the flashy and complicated aligners from the big players is Hunalign. The aligner also has other features like creating TMX files and downloading EU legislation or any other bilingual HTML webpage for alignment (see details on the web features further down). LF Aligner also has a couple of features designed for larger-scale corpus building, such as handling huge data sets, built-in data filtering, batch mode, automatic segmentation evaluation and unattended operation. I wrote it to make what is probably the best open source automatic sentence aligning algorithm, Hunalign (see ) more convenient to use. LF Aligner is intended for translators who wish to create translation memories from translations made without a CAT tool or from any other text that is available in two or more languages. Input files and how they are handled, tagged formats, running in perl Advanced tips: the built-in sentence splitter (segmenter), using your CAT for segmentation, Hunalign, GUI Batch alignment using command line arguments Downloading EU and other documents from the web, language codes Input files, notes on doc, docx, rtf and pdf, basic instructions Contact: the latest release from the original source: