Jump to content

MinT

From mediawiki.org
This page is a translated version of the page MinT and the translation is 77% complete.
Outdated translations are marked like this.

MinT(機器輔助翻譯)是基於開放原始碼的神經機器翻譯模型開發的機器翻譯服務。 該服務托管於維基媒體基金會的基礎設施,运行其他組織發布的開源協議翻譯模型。 開放的機器翻譯服務是打造自由知識生態系统的重要基礎設施的關鍵環節。 此頁面會介紹擴大服務適用範圍方面的舉措。

您可以在個別專案如 translatewiki.net 和安裝內容翻譯功能的專案中試用 MinT 功能,也可以直接在測試實例中使用。

Overview of MinT initiatives

Machine translation can be useful in different contexts. As more products make use of MinT for different purposes, it is useful to differentiate those different contexts. In this way, when users report a bug it is more clear where it needs to be fixed.

  • MinT Service. The backend service running open-source neural machine translation models.
    • MinT test instance. A basic interface to try the different translation models.
  • MinT for Translators. Initiative to integrate the MinT Service with tools that support other machine translaiton services such as Content Translation and the Translate Extension.
    • MinT Client for Content Translation. Client exposing the MinT Service as one of the machine translation services available in Content Translation.
    • MinT Client for Translate extension. Client exposing the MinT Service as one of the machine translation services available in the Translate extension.
  • MinT for Wiki Readers. Product to enable readers to use machine translation to read contents from other languages on a wiki.

You can read more below about each of the MinT initiatives.

參與其中

歡迎隨時在討論頁面分享意見回饋。 Phabricator 收集了各種改進計畫(更多資訊),您可以回報錯誤行為提供改進方案、追蹤工作進度並分享個人觀點。 您也可以在下方查看成品的狀態情報

MinT服務

MinT服務採用多個機器翻譯模型提供語言翻譯。 目前版本使用的模型如下:

  • NLLB-200。這是 Meta 研究團隊的 [1] 提供的最新模型。 此模型支援 200 種語言翻譯,包括多種其他同類模型支援範圍以外的語言。
  • OpusMT。赫爾辛基大學的 [2] 集合了多門語言的自由協議內容,用以訓練 OpusMT 翻譯模型 人人都可以透過參與給 OPUS 提供資料的專案,提升翻譯品質。 例如,使用內容翻譯功能建立維基百科條目的翻譯時,已發布的譯文的資料會成為下一版模型提升翻譯品質的新資源。 此外,向 [3] 提供翻譯例句也有助於改善翻譯品質。
  • IndicTrans2。IndicTrans2 專案提供的翻譯模型支援 20 多種印度語言。 這些模型開發於印度理工學院馬德拉斯校區的 AI4Bharat 實驗室。
  • Softcatalà。Softcatalà 是非營利組織,力圖改進加泰蘭文在數位產品中的應用。 該組織提供的翻譯器服務使用的翻譯模型支援 10 種語言與加泰蘭文間的互譯,是 Softcatalà 翻譯專案的一部分,現已公開發布
  • MADLAD-400. MADLAD-400 is a multilingual machine translation model by Google Research that supports 419 languages.

MinT支援200多種語言,包括70多種其他同類服務不支援的語言(其中有27種維基百科尚未支援的語言)。 您可以進一步了解MinT的最初發布版本,並在服務的簡介頁面查看一些常見問題。

技術細節

翻譯模型使用 OpenNMT Ctranslate2 程式庫進行了效能最佳化,以減少 GPU 加速需求。 這讓組織與個人更容易建立並執行自己的實例。 如需更多詳情,您可以查看以下內容:

MinT 提供執行多個翻譯模型的平臺。 為了支援各種情況,語句分節語言偵測、內容的預先/後期處理以及富文字支援等方面優先於純文字模型開發。

Test instance

The MinT test instance is a basic interface to try the different translation models. It allow to translate contents across the selected language pairs and select the preferred translation model when multiple are available. This allows different communities to check how well the models support their language. This instance is intended for testing, so performance and availability may be reduced compared to other MinT-based products. You can check the availability status of the MinT test instance.


譯者用 MinT

在行動裝置使用 MinT 翻譯

在維基媒體生態系統中,掌握多門語言的使用者經常透過翻譯的手段貢獻內容。 機器翻譯可以為使用者提供有參考價值的譯文,經審核與改進後即可實際使用。 編輯流程中提供了語言團隊開發的翻譯工具,可整合各種機器翻譯服務的譯文,提升翻譯效率。 MinT 功能推出後,自然要將其與這些工具整合,以進一步增強輔助功效。 支援 MinT 功能的有以下幾個專案:

維基讀者用MinT

讀者可以從維基百科和其他維基上了解的話題數量與資訊量取決於他們掌握的語言。 機器翻譯可以協助人們在他們不掌握的語言中了解感興趣的話題。

此方案探討了如何在維基百科條目中呈現 MinT 的機器翻譯協助,以便:

  • 讓讀者可以從其他語言中進一步了解感興趣的話題。
  • 明確區分社群建立的內容和自動產生的內容。
  • 鼓勵使用者在能力範圍內存取和為社群建立更多內容。

語言團隊此時正在進行專案的設計與研發工作,尋找在維基百科上呈現 MinT 的最好方式,並對服務在此場合的實用進行技術探索。

擴大 MinT 適用範圍

研究先前的方案有助於改善與加強系統。 目前,MinT API 僅支援維基媒體產品。 系統就緒後,我們將考慮擴大支援範圍。 为社群提供能够以創新方式使用的服務亦是为社区提供強大助力。 未來,這裡會介绍更多有关擴大 MinT 適用範圍方面的新举措。 屆時,請隨時設定並試驗自己的 MinT 实例。

Disclaimer

  1. Accuracy of MinT’s Translations: The accuracy of translations generated by MinT may vary. Translations may not be entirely accurate or may not always convey the intended meaning or context of the original content. Wikimedia makes no representations or warranties regarding the accuracy or adequacy of the automatically translated content.
  2. Limitation of Liability: Wikimedia, its affiliates, and employees are not liable for any direct, indirect, incidental, punitive, or consequential damages, including but not limited to damages for goodwill, use, data, or any other intangible losses arising out of or in connection with the use of MinT or translations generated with MinT.
  3. Creative Commons Compliance: Translations generated with MinT are considered derivative works under the applicable Creative Commons license governing the original content. Users shall comply with the terms of the applicable Creative Commons license when using translated content.
  4. Terms of Use and Privacy Policy: Use of MinT is subject to Wikimedia's Terms of Use and Privacy Policy.

狀態情報

2024年2月

2024年1月

2023年12月

2023年11月

2023年10月

  • Launched the Language Identification service to automatically detect in which language is written a given text. The service supports the detection of 201 languages, and anyone can access the API to use the service or read the model card for more details. Machine Learning team completed the last checks after deploying to LiftWing and evaluating that the service can "easily withstand a high amount of traffic".
  • Basic support for rich text translation by supporting transferring of markup to apply styling such as words in bold from the source text into the equivalent ones in the machine translation (which lacks format since translation models operate with plain-text).
  • Completed the process to enable MinT for languages with no Wikipedia yet . Translation models in MinT support 25 languages for which there is no Wikipedia. These can be tested in MinT's test instance for speakers of those languages to assess quality, and ensures that translation tools are well-equipped once wikis are created for those languages (as it has been the case with the recent graduation of Fon Wikipedia out of incubator).
  • Completed the process to enable MinT for closely-related languages based on Community input . For some languages where machine translation is not available, Wikipedia editors have asked to have access to machine translation in Content Translation using a related language instead of having no support at all. With this enablement translators of Gan (gan) Wikipedia will have machine translation based on the traditional script variant of Chinese as a starting point.
  • Analysis of translation activity on 55 languages for which MinT provides machine translation for the first time shows how (a) translations have increased 2X since MinT is available, and (b) deletion rates have not increased. Activity levels for these 55 wikis changed from ~500 translations/month, to 1K+ translations/month after MinT was enabled. For example, a recent peak of 2.15K translations were published in August 2023 when MinT was available for those languages, which is a significant increase from 225 translations in August 2022 when MinT was not available for them.
  • Better visibility of translation quality by including a tag in translations where unedited machine translation is close to the limits. This will facilitate analysis about translation quality and limits.

2023年9月

2023年8月

2023年7月