Data Platform Engineering Bug Report or Data Problem Form.
Please fill out the following
Please ensure you set priority
What kind of problem are you reporting?
- Access related problem
- Service related problem
- Data related problem
As a follow-up to the Central Asian WikiCon 2025, which took place in Uzbekistan, and also as part of T364147, I am trying to understand the usage of the language variants for the Uzbek (uz) and Tajik (tg) languages.
I found that Turnilo includes language variants information, which could be useful, but several things about the results look very unusual.
First, if I filter out the "default" variant, which, expectedly, has the largest number of views by far, the most prominent variants are all from Chinese (zh) and Serbian (sr). Also, if I filter only by including zh-* variants and split by project, then zh.wikipedia and other zh.* projects are the most popular ones, and same is true for sr-* variants.
All of the above is sensible, because Chinese and Serbian are two prominent languages with very active editing communities and very high demand for all their variants (Traditional/Simplified, Cyrillic/Latin).
So far, so good. But if I try to look at variants for other languages, everything becomes weird.
If I go back to the first search, where I filter out "default" and split by project, then pretty much all, or at least most of the "variants" that come after zh and sr, are not actually variants, but mostly regular language codes, with "en" being the most prominent. Some of them are quite obscure, such as "gaz" or "pro", which I've never seen those anywhere in the Wikimedia world before.
What's even more interesting is that if I filter by excluding "default" and all the "zh-*" and "sr-*" variants, and then split by project, then the most popular project by far is Wikifunctions. And if I filter by including only tg, tg-cyrl, and tg-latn, then Wikifunctions is the only project that has any hits. The same happens if I include only uz, uz-cyrl, and uz-latn. Perhaps this happens because of Wikifunctions' URL structure, which is different from most other wikis, but I might be wrong.
Something about this seems quite wrong. For example, I'd expect at least some usage for variants of uz-* in uz.wikipedia.org, given that both Cyrillic and Latin alphabets are frequently used in Uzbekistan (I've just visited the country a few days ago and saw it a lot). I wouldn't be surprised if variants of Uzbek had lower usage than variants of Serbian, but currently, there's nothing at all; according to Turnilo, uz-* variants are used only in Wikifunctions.