Hi!
User Details
- User Since
- Oct 25 2014, 1:53 AM (552 w, 4 d)
- Roles
- Administrator
- Availability
- Available
- IRC Nick
- Bawolff
- LDAP User
- Brian Wolff
- MediaWiki User
- Bawolff [ Global Accounts ]
Yesterday
It seems like the reset title cache patch may have broken scribunto tests, at least on the REL1_43 branch.
Tue, May 27
OSM is using javascript gadget to swap the 24 hr dates to 12 hour ones
Sat, May 24
Sun, May 18
We should probably stop linking to sizes we no longer make.
See T393851
Sat, May 17
I kind of wonder if this is the sort of thing better handled by simply blocking users who do things like this. Even if we banned external links in sigs, nothing is stopping the user from manually writing out their preferred "signature" instead of using ~~~~. Restrictions on signatures more serve to prevent people who don't know better from doing annoying things. Its not really a good security measure against actually malicious users.
I guess so. In the longer term we probably want a better solution though.
Sun, May 11
Anyways, closing this bug since the file is uploaded.
Looking at logs, it looked like what happened is there were two publish jobs. The second one failed (due to the first one having a lock) which caused an error to be communicated to the client. However the first one was still going and eventually succeeded. Indeed https://commons.wikimedia.org/wiki/File:Cat_Valentine's_TOP_62_Moments_in_Victorious!_-_NickRewind.webm exists
Mon, May 5
Tue, Apr 29
This is kind of a weird situation, because the thing the test is supposed to be testing is that this doesn't happen.
Apr 22 2025
So it looks like the DB use is as follows:
Apr 14 2025
Thanks. I'll investigate further. My gut feeling is that opening the transaction is more due to the mediawiki framework than anything else, and the code isn't really relying on implicit transactions being there, so i suspect that is something fixable without too much trouble (and in any case given we are reconnecting, we dont get the benefits of implicit transactions anyways)
Apr 13 2025
If I was trying to pivot, one thing I thing I'd try to do would be to try and write something to db or cache that might get executed. e.g. Anything still using php unserialize() or Mustache templates. So one thing that might make sense here is to set a different $wgSecretKey between auth and normal (for mustache) [or making a new var just for that], and making sure every instance of unserialize() uses the second argument to limit class types.
I'm pretty sure that master should work with 1.43, but im also happy to backport if that makes things easier.
Apr 12 2025
So my theory for what is happening here so far:
Filed T391755 for increasing the upload by url time limit.
Looking further in the logs, it appears that the assemble job also loses the DB connection, but there is a Wikimedia\Rdbms\Database::handleErroredQuery: lost connection to db1227 with error 2006; reconnected log, so i guess no explicit transaction is open, so there is no issue. Kind of odd that Publish has an open transaction but Assemble does not.
Even if the speed of the connection to IA had been unlimited (or there had been no timeout for that portion), wouldn't the operation still fail at the publication stage?
Just as a note, there are reports of 180 seconds being too small at T391158
Apr 11 2025
For The Cocoanuts.webm, we have the following errors:
So looking at the logstash, it appears the "async" flag was not set for this upload (and hence it was not done via the jobqueue). It is somewhat expected that non-async chunked uploads will fail for uploads of this size.
Apr 7 2025
Mar 31 2025
Mar 27 2025
Mar 24 2025
Mar 21 2025
The php bindings do not support tailorings. If they added that it would make a lot of things easier
Mar 19 2025
Mar 18 2025
However, I disagree strongly with the "inline SVG is secure in browsers by default" which is the current POC patch.
Thanks for your input on vowels. Since we've decided to give them their own headers, let's move ێ into its own section.
Mar 17 2025
It might just have been linked somewhere prominent with some sort of embedded google translate.
Mar 16 2025
Mar 15 2025
Huh, weirdly they didn't just forget those characters but icu intentionally put them in the wrong place at the end. https://github.com/unicode-org/cldr/blob/main/common/collation/smn.xml#L27
This sounds like you were running the command as the wrong unix user.
Mar 13 2025
Ok. Based on that description, i think its best we use a custom collation instead of the UCA based one (The custom one gives us more flexibility. The UCA one is more complex and might give better results for characters from other languages but it doesn't allow us to customize it as much. The big difference is that the UCA one allows more options for breaking ties, but i don't think that is super important for ckb. UCA might sort some obscure letters and foreign letters better).
hmm, i also wonder if projects using the identity (case sensitive) collation (like wiktionary) would be interested in this type of sorting, or if they really want every letter.
Mar 11 2025
Just as a reminder (because i didnt see it in the SAL), after changing the setting you must run updateCollation.php (otherwise pre-existing categories will be sorted wrongly and behave weirdly.)
Mar 8 2025
Mar 7 2025
Mar 6 2025
We don't really have the ability to break up groups (or at least, not very easily). We mostly just have the ability to chose which letter represents the group. (Or potentially a string like "ٲ - ئ" if that is better, as long as it starts with one of the letters in question)
This was likely an issue with UTF-8 normalization where the ø was being entered in ISO-8859-1 instead of UTF-8.
This sounds like it can be accomplished by a lua module if desired, or just manually with defaultsort. I would suggest declining this bug
The CKB collation is inheriting from the farsi collation.
Maybe includes/collation/data/first-letters-root.php has to be regenerated for more recent CLDR, or perhaps the script just chooses the wrong representive in that case. ꭓ certainly seems like a better representative of the class than ᶍ,
Looks like CLDR data to this day is still the wrong way - https://github.com/unicode-org/cldr/blob/main/common/collation/lt.xml . In addition to considering Y to be only secondary difference to I, it also considers Į to be secondary difference to I (The Wikipedia article makes it sound like I, Į, and Y should all be primary difference). The CLDR data cites Bronius Piesarkas: Lithuanian-English Dictionary ISBN 9986-465-56-7 as a source.
I'm 99% sure that this is due to the 10MB max attribute size limitation. This means that embedded raster images are not allowed to exceed 10MB (after base64). The files in question appear to have large raster images embedded in them
Mar 5 2025
In addition to that patch, $wgCategoryCollation has to be changed for that wiki prior to running the maintenance script.
Mar 3 2025
Mar 2 2025
Mar 1 2025
Oh, maybe i misunderstood. https://opensource.contentauthenticity.org/docs/verify-known-cert-list/ says there is a hard coded list.
So if I'm reading the linked pages correctly, this system is based on cameras, photo editing tools, and other tools involved in editing the image to cryptographically sign the file in some way. How would we decide which signatures are valid and which are not?
How does this ensure that this doesn't end up promoting proprietary tools as more "trustworthy" than similar free software tools?As far as I understand this is an open initiative that is run by a non-profit.
A major question here is do we actually validate the sigs or do we just display the value? If we do validate and its invalid, do we still display it or do we pretend it doesnt exist or show a warning of some kind?
Feb 28 2025
Sorry, my bad. I got confused reading the backscroll and thought it was only the parsoid patches that got deployed.
Can users control these things?
Feb 27 2025
Perhaps XSS should be split up into i18n-xss and other xss because the two have really different risk profiles so its a bit confusing to group the two.
I'm having trouble finding details of the normalization algorithm, but experimentally we can gain confidence:
Yes, I agree that would be better. What I don't know is why we choose to normalize html (or wikitext) to NFC there. Is that a step we actually need?
Thinking about this
Feb 26 2025
Feb 25 2025
Yes, I agree that would be better. What I don't know is why we choose to normalize html (or wikitext) to NFC there. Is that a step we actually need?
For reference, the exception is:
Another PoC:
Feb 24 2025
This is a really cool vuln.
@sbassett I think everything is merged and deployed at this point. Is it ok to make the task public?
Feb 14 2025
I guess all those are using a different version than wikipedia. I believe wikipedia uses CLDR 37.
The CLDR looks correct to me (But of course I don't speak this language). I'd prefer we use the CLDR collation instead of our own as long as it is correct. You can test the CLDR collation at https://icu4c-demos.unicode.org/icu-bin/collation.html making sure the drop down menu on the top left is set to kk (type=standard): Kazakh (Standard Sort Order)
Feb 13 2025
(probably starting with non-current WMF staff, who may be harder to reach)
Feb 8 2025
I think a good solution might be:
Feb 7 2025
FYI, i do intend to turn this into a blog post once it is public
Feb 4 2025
I'd agree that the main mitigations are now merged.
Feb 1 2025
Part of the reason this feels silly to me, is this dataset should be significantly easier to serve than Wikidata.
If you have any example edits where any of those are being used in practice, then that might be helpful to add to the entry? (or add to any related documentation on MediaWiki-wiki, which we could link to?) (Anytime before next Friday)
Jan 31 2025
The change is aimed at (advanced) template editors who use the TemplateStyles feature. It is primarily an accessibility improvement, allowing editors to adjust templates to display differently depending on the users accessibility preferences.
Jan 30 2025
As an aside, we've had lots of problems in the past related to not storing collation version numbers (different versions of libicu have incompatible output). If we are in the process of rearranging this all, it would be cool if we added a solution to that.
Jan 29 2025
Yes, issue is still present as far as i know
No worries. I'm going to mark this task closed.