Page MenuHomePhabricator

Split Mindong (cdo) into cdo-hans, cdo-hant/cdo-hani, cdo-latn
Closed, ResolvedPublic

Description

We need to have Mindong translations split into

  • Solution 1: using Hani and Hans
    • cdo-hani : Han script
    • cdo-hans : Simplified Han script
    • cdo-latn : Latin script
  • Solution 2: using Hant and Hans
    • cdo-hans : Simplified Han script
    • cdo-hant : Traditional Han script
    • cdo-latn : Latin script

After complete the split, we need to disable cdo on translatewiki with the message "This language code should remain unused. Localise in cdo-hani/cdo-hant, cdo-hans or cdo-latn please."


  • Allow localization on translatewiki.net
    • cdo-hans
    • cdo-hant
    • cdo-latn
  • First translation export:
    • cdo-hant
    • cdo-latn
  • Add to Names.php:
    • cdo-hant
    • cdo-latn
  • Add to CLDR extension
    • cdo-hani
    • cdo-hant
    • cdo-latn
  • Add to wikimedia/language-data
    • cdo-hani
    • cdo-hans
    • cdo-hant
    • cdo-latn
  • Update language-data in the jquery.uls (pull request)
  • Update jquery.uls in the ULS extension
  • Add to jquery.i18n:
    • cdo-hant
    • cdo-latn
  • Add to abstract-wiki/wikifunctions/function-schemata
    • cdo-hant
    • cdo-latn
  • Add to WikiLambda
    • cdo-hant
    • cdo-latn
  • Add to Wikifunctions
    • cdo-hant
    • cdo-latn

Consensus:

Details

TitleReferenceAuthorSource BranchDest Branch
Update function-schemata sub-module to HEAD (5d98a1f)repos/abstract-wiki/wikifunctions/function-orchestrator!256jforrestersync-function-schematamain
Update function-schemata sub-module to HEAD (5d98a1f)repos/abstract-wiki/wikifunctions/function-evaluator!276jforrestersync-function-schematamain
Update function-schemata sub-module to HEAD (5d98a1f)repos/abstract-wiki/wikifunctions/wikilambda-cli!56jforrestersync-function-schematamain
definitions: Add Z1952/bax-bamu, Z1953/xon, Z1954/cdo-hant and Z1955/cdo-latn ZNaturalLanguagesrepos/abstract-wiki/wikifunctions/function-schemata!177jforresterlanguages-bax-bamu-xonmain
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #672646 merged by jenkins-bot:

[mediawiki/extensions/cldr@master] Split Mindong (cdo) translations

https://gerrit.wikimedia.org/r/672646

The patch https://gerrit.wikimedia.org/r/c/translatewiki/+/923636 may get merged soon, but after that is done, is there a plan to move the existing translations in translatewiki?

I went over all of them and found this:
Blockly: hani
Intuition: latn
jquery.uls: latn
MathJax: hani
MediaWiki: mix
Pageviews: latn
pywikibot: latn
translatewiki: mix
portals: mix (we'll need to figure out how to represent it correctly with the developers of the portals project, so that it will word correctly on the https://wikipedia.org page)
XTools: Latn

Most of them have few translations and will be relatively easy to move manually.

The most difficult group is probably MediaWiki (translated, outdated). It includes a lot of translations, and they are in both latn and hani.

What makes sense to me for cleaning up MediaWiki is this:

  1. Update all the outdated ("fuzzy") messages in the same writing system in which they were written.
  2. Count how many translations are there in each writing system. Manually move those in which there are fewer translations.
  3. Use the renaming script to rename the rest: https://translatewiki.net/wiki/Renaming_language_codes . This will probably require coördination with translatewiki server managers (like @abi_ and @Raymond).

First, we can automatically copy all the "cdo" translation text to the "cdo-latn" translation text and the "cdo-hani" translation text.

Then we can manually modify the "cdo-latn" translation text and the "cdo-hani" translation text.

After looking into the discussions 10 years ago, I believe there are some misunderstandings about the division of the different between Simplified-Traditional Han relationship.

There are some incorrect/inaccurate points in the discussion:

"The scope of Simplified Han characters are defined by Modern Standard Mandarin"

This is inaccurate. wuu-Hant, nan-Hant also included "个" which is also a character that used traditionally, but mapped as Simplified Han due to the standardization and the simplification of Mandarin, Traditional Han script.

The "simplification" for those languages are treated differently from Mandarin, and they misunderstood them as the same way of Mandarin.

nan-Hani

Afaik, most Hokkien Southern Min websites written in Traditional form of Han script used "nan-Hant" language code instead of "nan-Hani".

Also, it's still possible for nan-Hans. (As said, The "simplification" for those languages are treated differently from Mandarin, and they misunderstood them as the same way of Mandarin.)

Conclusion: "Hant" means "The traditional way of writing Han characters for Sinitic languages."

@Winston_Sung
The reason why we choose Hani instead of Hant is that we need extension characters, which cannot be easily involved in either simplified or traditional orthography. Yes, we do mainly use traditional characters, but there are still too many extension characters needed.

We need extension characters, which cannot be easily involved in either simplified or traditional orthography.
There are still too many extension characters needed.

Cantonese yue do use a lot of extension characters, but this didn't affect the use of yue-hans + yue-hant .

Hani characters can be used under both *-Hans + *-Hant without conflict. *-Hans/-*Hant already included Hani (Hani is already part of *-Hans/*-Hant)

To be clear, any Hani characters (including extension characters) can be used in zh-Hans and zh-Hant without any problem (as an example).

We need extension characters, which cannot be easily involved in either simplified or traditional orthography.
There are still too many extension characters needed.

Cantonese yue do use a lot of extension characters, but this didn't affect the use of yue-hans + yue-hant .

Hani characters can be used under both *-Hans + *-Hant without conflict. *-Hans/-*Hant already included Hani (Hani is already part of *-Hans/*-Hant)

To be clear, any Hani characters (including extension characters) can be used in zh-Hans and zh-Hant without any problem (as an example).

Hani is not a part of Hans or Hant.
Hans and Hant are part of Hani.
Hani means Hans and Hant and other Han characters.

Hani is not a part of Hans or Hant.
Hans and Hant are part of Hani.
Hani means Hans and Hant and other Han characters.

You're replying me Hans and Hant using the script-scope, while I'm pointing out *-Hans and *-Hant on the language scope.

*-Hans does not mean the same of Hans means here, *-Hant does not mean the same of Hant means here.

Hani is not a part of Hans or Hant.
Hans and Hant are part of Hani.
Hani means Hans and Hant and other Han characters.

You're replying me Hans and Hant using the script-scope, while I'm pointing out *-Hans and *-Hant on the language scope.

*-Hans does not mean the same of Hans means here, *-Hant does not mean the same of Hant means here.

It doesn't matter. We can use cdo-Hani, which just means Min Dong Chinese language with any kinds of Han scripts (e.g.. simplified characters, traditional characters, extension characters, etc.), even though we mainly use traditional characters and extension characters.

We can use cdo-Hani, which just means Mindong language with any kinds of Han scripts even though we mainly use traditional characters and extension characters.

This is really confusing to use cdo-Hani .

  • Mindong is not even writing in mixed-Han like "新音楽時報" ("楽" + "時", "報").
  • The "not simplifyable" statement is false. I've found some examples using Mindong Simplified Han writing, for example, "侬" ("儂").

This is really confusing to use cdo-Hani .

  • Mindong is not even writing in mixed-Han like "新音楽時報" ("楽" + "時", "報").
  • The "not simplifyable" statement is false. I've found some examples using Mindong Simplified Han writing, for example, "侬" ("儂").

cdo-Hani does not mean that it's a mixture of simplified and traditional forms. It only means that it's written using Han characters (rather than another script like Latin). All Han characters in Unicode have the script "Han", which corresponds to the script code Hani, so all cdo-Hant and cdo-Hans text is by definition also cdo-Hani.

The reason why we choose Hani instead of Hant is that we need extension characters, which cannot be easily involved in either simplified or traditional orthography. Yes, we do mainly use traditional characters, but there are still too many extension characters needed.

In Unicode, "simplified" and "traditional" are not limited to characters that appear in official lists or in older encodings like Big5 and GB2312.

There are characters which are only encoded in Unicode which are considered simplified/traditional pairs, e.g. 𱊜 (U+3129C) and 𪈼 (U+2D892) which link to each other using the properties kTraditionalVariant and kSimplifiedVariant. https://www.unicode.org/reports/tr38/index.html#SCTC has more information about those properties.

The thing is that Hans and Hant refer to specific systems of orthography.

For example,

Hans: 个
Hant: 個
Hani: 个個箇

If you use the character , it is neither Hans nor Hant.

The thing is that Hans and Hant refer to specific systems of orthography.

For example,

Hans: 个
Hant: 個
Hani: 个個箇

If you use the character , it is neither Hans nor Hant.

No. Wu has "个", "個", "箇" in both Hans and Hant. It is language-dependent.

If you use the character in Wu, it is both Hans and Hant.

https://wuu.wikipedia.org/wiki/MediaWiki:Conversiontable/wuu-hans

圖片.png (502×1 px, 68 KB)

Please refer to the Language-divided part (differ by language part).

Also, if you use the character in Hakka, it is both Hans and Hant.

Winston_Sung renamed this task from Split Mindong (cdo) translations to Split Mindong (cdo) into cdo-hans, cdo-hant, cdo-latn.Aug 9 2024, 2:38 PM
Winston_Sung updated the task description. (Show Details)
Winston_Sung renamed this task from Split Mindong (cdo) into cdo-hans, cdo-hant, cdo-latn to Split Mindong (cdo) into cdo-hans, cdo-hant/cdo-hani, cdo-latn.Aug 9 2024, 2:46 PM

We now have examples of cdo-Hans : https://www.ydict.net/w/CgNENDc

I guess it might be problematic for us to use the Hans–Hani relationship if we want to have a language tag to represent cdo-Hans + cdo-Hant ?

@Yejianfei

Should use -Hans and -Hant in consistant with other Chinese topolects.

Change #923636 merged by jenkins-bot:

[translatewiki@master] Languages: Split Mindong (cdo)

https://gerrit.wikimedia.org/r/923636

Change #876406 merged by jenkins-bot:

[mediawiki/core@master] Languages: Add cdo-hant, cdo-latn (Mindong - Traditional Han script, Latin script) to Names.php

https://gerrit.wikimedia.org/r/876406

jforrester updated https://gitlab.wikimedia.org/repos/abstract-wiki/wikifunctions/function-schemata/-/merge_requests/177

definitions: Add Z1952/bax-bamu, Z1953/xon, Z1954/cdo-hant and Z1955/cdo-latn ZNaturalLanguages

dmartin merged https://gitlab.wikimedia.org/repos/abstract-wiki/wikifunctions/function-schemata/-/merge_requests/177

definitions: Add Z1952/bax-bamu, Z1953/xon, Z1954/cdo-hant and Z1955/cdo-latn ZNaturalLanguages

Change #1094540 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (5d98a1f)

https://gerrit.wikimedia.org/r/1094540

Change #1094540 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Update function-schemata sub-module to HEAD (5d98a1f)

https://gerrit.wikimedia.org/r/1094540

Change #1098518 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2024-11-19-140330 to 2024-11-27-074306

https://gerrit.wikimedia.org/r/1098518

Change #1098519 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2024-11-19-132736 to 2024-11-26-193226

https://gerrit.wikimedia.org/r/1098519

Change #1098518 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2024-11-19-140330 to 2024-11-27-074306

https://gerrit.wikimedia.org/r/1098518

Winston_Sung changed the task status from Open to In Progress.Nov 27 2024, 3:13 PM
Winston_Sung changed the status of subtask T380046: Add Mindong (Traditional Han script) (cdo-hant) to Names.php from Open to In Progress.

Change #1098519 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2024-11-19-132736 to 2024-11-26-193226

https://gerrit.wikimedia.org/r/1098519

Change #1099007 had a related patch set uploaded (by Arlolra; author: Winston Sung):

[mediawiki/core@REL1_43] Languages: Add cdo-hant, cdo-latn (Mindong - Traditional Han script, Latin script) to Names.php

https://gerrit.wikimedia.org/r/1099007

Change #1099007 abandoned by Arlolra:

[mediawiki/core@REL1_43] Languages: Add cdo-hant, cdo-latn (Mindong - Traditional Han script, Latin script) to Names.php

https://gerrit.wikimedia.org/r/1099007

Is there anything left to be done here?

Change #1115032 had a related patch set uploaded (by Cory Massaro; author: Cory Massaro):

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from version: 2025-01-22-203140 to 2025-01-28-144249

https://gerrit.wikimedia.org/r/1115032

Change #1115032 abandoned by Cory Massaro:

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from version: 2025-01-22-203140 to 2025-01-28-144249

Reason:

already done

https://gerrit.wikimedia.org/r/1115032

Change #1117844 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[translatewiki@master] Remove cdo-hant, cdo-latn (Mindong - Traditional Han script, Latin script), added to Names.php

https://gerrit.wikimedia.org/r/1117844

Change #1117844 merged by Srishakatux:

[translatewiki@master] Remove cdo-hant, cdo-latn (Mindong - Traditional Han script, Latin script), added to Names.php

https://gerrit.wikimedia.org/r/1117844

Change #1119150 had a related patch set uploaded (by Winston Sung; author: NMW03):

[mediawiki/extensions/UniversalLanguageSelector@master] Update jquery.ime and jquery.uls from upstream

https://gerrit.wikimedia.org/r/1119150

Change #1119150 merged by jenkins-bot:

[mediawiki/extensions/UniversalLanguageSelector@master] Update jquery.ime and jquery.uls from upstream

https://gerrit.wikimedia.org/r/1119150

Winston_Sung claimed this task.
Winston_Sung removed a project: Patch-For-Review.
Winston_Sung updated the task description. (Show Details)

Change #1119364 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[translatewiki@master] Disable cdo, use cdo-hans, cdo-hant, cdo-latn instead

https://gerrit.wikimedia.org/r/1119364

Change #1119364 merged by jenkins-bot:

[translatewiki@master] Disable cdo, use cdo-hans, cdo-hant, cdo-latn instead

https://gerrit.wikimedia.org/r/1119364