Machine Translation

File translation

What file formats are supported?

File formats
Office.doc, .docx, .docm, .xlsx, .pptx, .odt, .odp, .ods, .txt, .rtf, .pages, .sxw, .pdf
Scanned or image files.pdf, .jpg, .jpeg, .png, .bmp
Interchange formats.sdlxliff, .sdlxlf, .ttx, .tmx, .xlf, .xliff
Web.html, .htm, .json, .xhtml, .xht
Other.tex, .srt

What is the maximum file size that file translation supports?

Maximum file size depends on your subscription type. See subscriptions

How can I translate larger files?

Break down large files: Divide lengthy documents into smaller sections for better and faster translation.

Will my unused file translations be carried over to the next month?

No, monthly translation quotas can't be carried forward.

How can I prepare my files for translation?

Make sure that the source file is clean and error free. For Microsoft Office documents turn off "Track Changes".

What if my file is in multiple languages?

If your file is in multiple languages, you will need to divide the file and translate each language section separately. Otherwise the parts that don't correspond to the selected language direction will be incorrectly translated or corrupted.

How are pdf's translated?

When a .pdf file is uploaded for translation it is converted into a translatable format. Both scanned and native .pdfs are supported. Translations can be downloaded in .docx or .pdf formats and will retain the basic formatting. Formatting in the translated .docx and .pdf files may slightly differ.

How to speed up pdf translation?

As conversion takes time, to speed things up split your .pdfs and delete the pages with no translatable information. Factors like the number of pages, quality and text density affect the processing time.

How are scanned pdf's converted?

The file is analysed and its content is split in text, table and image/background image areas. They are analysed further to determine which content should be recognized as text.

Textual data are acquired by means of OCR or optical character recognition. Every character is recognized depending on its visual characteristics, then assembled into words and sentences.

Recognition is affected by many factors, including:

  • Quality of the scanned file (blurred or deformed text, image noise, image resolution, contrast etc.). For the best result, .pdfs should be at least 300 dpi, unskewed and noise free.
  • Font size. Sizes smaller than 9 points might be poorly recognized.
  • Rare and original fonts. Less common fonts may be unrecognizable or perform worse.
  • Complexity of the file structure and formatting.

How can I improve pdf translation quality?

To improve translation performance, we recommend to download the converted file in the original language and fix it manually before re-uploading for translation.

The conversion process is very complex and may be affected by various errors resulting from the input file.

Issues to look for:

  • Incorrectly recognized words.
  • Incorrect word boundaries.
  • Sentences spanning multiple columns, lines or pages split into different textual objects.
  • Misplaced page footers and headers.
  • Incorrect numbering.
  • Tables recognized as background images with textual content placed into different textual objects.
  • Some diagrams recognized as images without any textual content.
  • Separators, borders or other horizontal lines recognized as strikethrough or underlined text.
  • Formatting of the table of contents or similar lists collapsed or recognized as strikethrough or underlined text.