https://wikimediafoundation.org/wiki/User:Sthottingal-WMF
Maintains and engineer for ContentTranslation UniversalLanguageSelector and general MediaWiki-Internationalization
https://wikimediafoundation.org/wiki/User:Sthottingal-WMF
Maintains and engineer for ContentTranslation UniversalLanguageSelector and general MediaWiki-Internationalization
We do have a machine translation system for kaa - Karakalpak powered by MADLAD-400 model.
The benefits of webcomponents compared to html tags is well documented. Webcomponents are html stanadard and supported by all browsers. Ofcourse, you can achieve the same features with html tags and javascript - with trade-offs. I would like to see them supported in mediawiki as they are part of web standard.
I used this documentation https://www.mediawiki.org/wiki/VisualEditor/Hooks for code search
The 45% thresholds are extreme case. It means you cannot publish a translation that has >45% machine translation. This demands heavy editing from translator. We also don't tell the translator about these percentages(T251887).
In my opinition, a simpler, consistant and predictable threshold level is better. If T is threshold for errors, make T-10 as the warning threshold, irrespective of value of T.
Please use same campaign parameter. The[[ https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ArticlePlaceholder/+/1138453/1/modules/ext.articleplaceholder.createArticle/ext.articleplaceholder.createArticleTranslation.js#59 | ArticlePlaceholder patch ]] passes articleplaceholder as campaign param. CX Config and event source config should use the same. article_placeholder does not match.
Following discussions with the team, we've concluded that dynamic configuration is not a use case currently supported by Wikimedia. Our configurations are static, managed through updates and deployments. The
conf.yaml file facilitates this by allowing for varied configurations across different deployments, as it's separate from the codebase.
. It let me start translating, and only when I went to publish did it warn me that I would be overwriting an existing article. This seems like a step back from the behaviour of the old dashboard, where at least it would warn me before I started translating.
@jhsoby I was testing this again, and I don't see the issue now.
More importantly, the CX dashboard's new version is coming up. It will replace the current dashboard.
The following URL can be used to access new dashboard.
You can access it in https://no.wikipedia.org/w/index.php?title=Special:ContentTranslation&unified-dashboard=1&filter-type=automatic&filter-id=previous-edits&active-list=draft&from=en&to=nb#/
Seems working now. The blocking of retries for 30 mins for all consumers of API seems not an optimal approach. Also see T382376: PageViewInfo caches errors due to the request limit which can lead to denial of service
Thanks for that context. For the reasons I explained above, my advice would be to have DeepL client in your customized version and not in the upstream. Currently, our team or organization does not have a collaboration with DeepL. If this situation changes, we will definitely consider your client code. Thanks!
Hi, DeepL is a proprietary machine translation service and their API key has a price associated with - at least to use in the scale of current traffic of cxserver. For such external Machine translation services, we(WMF) usualy has a partnership. For example, to use Google API, WMF and Google has partnership agreement. For DeepL too, we will need such agreement(legal, branding, pricing).
In T387820#10621300, @Jdlrobson-WMF wrote:Hello is it possible this ticket relates to the train blocker at T388467: TypeError: mtSuggestions.map is not a function ?
@Jdlrobson Apologies, I missed this notification.
Supporting pages with categories will limit the tool capabilities IMO. The markup will not be present in the pages if we use category as a marker. If our custom <page-collection> marker is not present, we cannot render anything into that page - if at all we plan to render collection stats, activity logs, sharable links and CTAs in using that <page-collection> marker.
External Guidance extension is not recording the pages where it is shown. The only instrumentation we have is when Special:ExternalGuidance was accessed by clicking 'Edit' link from it. There is a technical difficulty as well to know which pages where UI adaptation of External Guidance is shown- They are external domains and cannot post data to our eventlogging systems.
The MT model that MinT uses is indictrans2 model. It was released 2 years back and not getting updates. But Google is actively updating their MT system and it is getting better. While the actual details of its technology is not open, given the LLM advancements, I expect their system is based on LLMs(or distillations).
Would it make sense to warn the user early that the article exists? This warning will be anyway shown when user reaches the editor. But can save their time if this is shown early in case overwriting is not the intention.
Hi, Could you please share the source and target languages of your translation? Also the MT engine(Google, MinT etc) you were using? Thanks!
It is better now:
The service is down since Jan 15 T383750: MinT: Fails to download models/files from peopleweb.discovery.wmnet
From the following command, using our google key.
curl --request GET \ --url 'https://translation.googleapis.com/language/translate/v2/languages?key=xxxxxxxxxxxxxxxxxxxxx' \ --header 'content-type: application/json'
This project is a candidate for sunsetting as per our last team discussions. So , I am not expecting any rewrite.
The example you mentioned, "Supporting digitization of old books that require special fonts for better reproduction," is directly relevant to ws-export in Bengali Wikisource (T344532).
The webfonts feature in Universal Language selector was originall to support scripts when operating systems does not ship default fonts. People used to see rectangles in such case. We had included free and opensource fonts for several languages for this(2011-2014). Thanks to Noto and many other projects, this situation has improved a lot. Except a few less known scripts, all scripts has good enough fonts in all operating systems. So UniversalLanguageSelector removed many of these fonts. Note that shipping a webfont to users browser has significant perfromance cost. There is also an issue that users often has preference on fonts and we can only ship one font. So the scope of UniversalLanguageSelctors webfont feature at present is very narrow.
We do track it in Content Translation grafana dashboard(not same as cxserver dashboard). We are not actively monitoring it since there is no related feature development.
In the Quantiles graph, all values were there, but count value 1 caused it hiding all other graphs. I disabled that it looks ok now.
Thanks for creating a ticket for further investigation on the approach.
If there is 65% overlap, we still have 35% essential, vital articles left out.
v2.4.0 available at https://www.npmjs.com/package/banana-i18n
Regions where these languages are spoken also need to be considered to get full picture. Nigeria for example has primary education in English and has a literary rate of 68%. How much of this literate internet accessing population who has primary language as English likes to use ff or ig or ki languages for reading encyclopedic content is very much the question here. Very low traffic to these wikis underlines this issue.
Please help me to understand the measurement plans here. Using a real example will help us in getting clarity.
You can see the performance improvement by visiting https://recommendation-api.thottingal.in/api/v1/translation?source=en&target=ig&count=12&seed=Vital%20articles&collections=true&include_pageviews=false&search_algorithm=morelike&rank_method=default - loads instantly.
Compare it with the production instance:
https://api.wikimedia.org/service/lw/recommendation/api/v1/translation?source=en&target=ig&count=12&seed=Vital%20articles&collections=true&include_pageviews=false&search_algorithm=morelike&rank_method=default - takes upto 10seconds
Yes, it seems same issue. Marking as duplicate
Time taken: About 6 mins
@GMikesell-WMF yes. you are right. The banner should be there in the language selector immediately after you open the selector.
I think the current logic to restrict the cache update to one thread is fragile - the PID % number of workers check.
In T380699#10351546, @gerritbot wrote:Change #1097181 had a related patch set uploaded (by Santhosh; author: Santhosh):
[research/recommendation-api@master] Make sure application is started with initialized cache
Problematic code:
@santhosh Can you expand what concretely needs to be done. Maybe an example patch? Is it just updating the JavaScript files to newer syntax, or do we need to update any build configuration?
All items in the checklist is completed. Can be closed once latest patches are deployed
Thanks @isarantopoulos For our immediate needs this seems sufficient.