Page MenuHomePhabricator

RevisionRecord::getPage() should return an immutable ProperPageIdentity
Closed, ResolvedPublic2 Estimated Story Points

Description

RevisionRecord::getPage() currently returns a PageIdentity, which may be a Title or a WikiPage. In the context of an ongoing page deletion, the state of these objects may change and the page ID may get set to 0. To provide a reliable stable interface that can be used e.g. in a deferred update or job, we need to return an immutable value object instead.

WARNING: This is easy to achieve, but any code that downcasts the return value of RevisionRecord::getPage() to Title will crash hard, and code that assumes the page ID contain in the returned object to be updated dynamically during page creation or deletion may get confused.

See also: T278459: Replace Title parameters with PageIdentity (straight forward cases)

Event Timeline

Change #1094003 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] RevisionRecord: make getPage() return a value object

https://gerrit.wikimedia.org/r/1094003

Change #1097373 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ParserOptions: don't rely on hooks in setupFakeRevision()

https://gerrit.wikimedia.org/r/1097373

Change #1097373 merged by jenkins-bot:

[mediawiki/core@master] ParserOptions: don't rely on Title in setupFakeRevision()

https://gerrit.wikimedia.org/r/1097373

HCoplin-WMF changed the task status from Open to In Progress.Dec 4 2024, 2:14 PM
HCoplin-WMF triaged this task as Medium priority.

Change #1094003 merged by jenkins-bot:

[mediawiki/core@master] Revision: make RevisionRecord::getPage() return a value object

https://gerrit.wikimedia.org/r/1094003

Turns out that forcing a ProperPageIdentity is at odds with the way we use "fake" revisions as parser input: T381982: Fatal exception of type "Wikimedia\Assert\PreconditionException" when marking a page for translation. The conceptual conflict is that revisions can only exist on editable pages, but page content (including multi-slot content) needs to be parsed in the context of a special page (or no page at all).

This could be resolved by:

  • continuing to allow a non-proper PageIdentity to be returned by RevisionRecord::getPage() - we can still narrow the return type on RevisionStoreRecord and RevisionArchiveRecord.
  • requiring a proper page for RevisionStoreRecord and RevisionArchiveRecord
  • introducing a ParserInput object to cover the need to provide rich context to the parser (T382341)
  • emitting a deprecation warning from MutableRevisionRecord when it is constructed with a non-proper page (a special page).

I suspect that having a sort of MutableRevisionRecord that doesn't require an editable page will be useful in the long run -- there are a lot of pre-MCR interfaces which have a Content and a ContentRenderer, and the easiest way to make them use a RevisionRenderer to yield a postprocessed ParserOutput is to have something "like" a one-content fake revision that we can use to wrap a Content. For example, taking one example at random from a bunch of similar code in core, in ApiQueryRevisionsBase.php we have:

			if ( $this->parseContent ) {
				$popts = ParserOptions::newFromContext( $this->getContext() );
				$po = $this->contentRenderer->getParserOutput(
					$content,
					$title,
					$revision, /* ignore this for the moment */
					$popts
				);
				// TODO T371004 move runOutputPipeline out of $parserOutput
				$text = $po->runOutputPipeline( $popts, [] )->getContentHolderText();
			}

and the easiest thing to do here would be something like:

$text = $this->revisionRenderer->getRenderedRevision(
   new FakeRevisionContentWrapper( $content ),
   $popts,
   null,
   [ 'flavor' => 'view' ],
 )->getContentHolderText();

(ignore the fact for the moment that we actually have a non-null $revision here, and FakeRevisionContentWrapper is of course a placeholder bikeshed name).

That said, I don't know that the "fake revision content wrapper" class *necessarily* has to be a MutableRevisionRecord. We could tighten up the MutableRevisionRecord type and still provide some other sort of "fake revision" interface.

For example, if you wanted to tighten up the type guarantees of RevisionRecord you could also add FakeRevisionRecord as a union type, so that places which really really want an editable revision can enforce that by only accepting RevisionRecord, and other places can accept a RevisionRecord|FakeRevisionRecord.

Another option would be to add ProperRevisionRecord as a subtype, and places which really need an editable revision can require ProperRevisionRecord.

You could introduce a method RevisionRecord::isEditableRevision() which returns true if this is a non-fake revision record (aka the page is a PageIdentity) but I think it would be better to use the type system for this if possible, and the method is unnecessary if the same thing can be done with an instanceof test.

Introducing a ParserInput object is maybe pleasant, but it doesn't really answer the question of how MCR interacts with special pages and previews and other such contexts. If RevisionRenderer strictly requires a real editable page, then we can't really use the MCR machinery to (say) construct the contents of Special pages or previews or API transform requests, and that seems unfortunate. We could retreat into the pre-MCR world and say that we use ContentRenderer for these things, and that you fundamentally can't combine multiple Contents without having a (real, editable) RevisionRecord for them, but that seems unfortunate. If we are going to use a "fake revision" (aka a "collection of contents") as part of the MCR machinery, then I think we don't really need a ParserInput class, and having another abstraction there which is almost-but-not-quite the same as a RevisionRecord would seem to be unnecessary complexity. So I think that's a bit of a distraction.

So I think I agree with your first two recommendations above, but not the second two. Instead I'd propose:

  • Create a FakeRevisionRecord similar to MutableRevisionRecord, but explicitly for non-editable pages, so that we can distinguish these cases in the type system.
  • Create a subclass ProperRevisionRecord which narrows the return type on RevisionRecord::getPage():
    • FakeRevisionRecord extends RevisionRecord
    • MutableRevisionRecord could eventually extend ProperRevisionRecord (once we deprecate its use with non-proper page identities)
    • RevisionStoreRecord and RevisionArchiveRecord extend ProperRevisionRecord
    • RevisionRenderer takes a RevisionRecord (explicitly not a ProperRevisionRecord)

Alternatively, you could add FakeRevisionRecord as a new standalone class which does not extend RevisionRecord, and broaden a bunch of APIs (including RevisionRenderer) to take RevisionRecord|FakeRevisionRecord. I don't like that quite as much, but maybe there's a bunch of RevisionRecord stuff which RevisionRenderer doesn't need, in which case RevisionRecord|FakeRevisionRecord might effectively be a narrower type for its use since it would prevent use of any method not in the intersection of the two types.

Finally, a bit of bikeshedding, instead of Proper* it seems like Editable* might be a better name prefix; EditableRevisionRecord and NonEditableRevisionRecord maybe a better way to express the difference between "proper" and "not"?

That said, I don't know that the "fake revision content wrapper" class *necessarily* has to be a MutableRevisionRecord. We could tighten up the MutableRevisionRecord type and still provide some other sort of "fake revision" interface.

That's exatcly what I'm proposing in T382341: Proposal: Introduce ParserInput object. Except that I prefer a name that more closely describes a prupose. It's not a RevisionRecord. Nothing is revised, and nothing is recorded.

Alternatively, you could add FakeRevisionRecord as a new standalone class which does not extend RevisionRecord, and broaden a bunch of APIs (including RevisionRenderer) to take RevisionRecord|FakeRevisionRecord.

Yes, exactly. That's why I put all the static pseudo-constructors on the class :)

I think part of the issue here is that MCR has somewhat mixed the idea of a "revision" with a "page". A "page" is composed of multiple contents and can be rendered as such, but RevisionRenderer expects a "revision" as the input.

A special page or a preview request or an API request might well involve multiple contents and require combining them, but (at present) we need a "revision" in order to do so.

We could also rename MCR to MCP (multi-content pages) or MCPI (multi-content "parser inputs") but without renaming a bunch of things having RevisionRenderer be responsible for rendering ParserInputs not Revisions seems a bit odd.

This is well into bikeshed territory, but I prefer keeping the idea that a Revision is just one particular version of a "page", aka a page can have multiple revisions but isn't required to, and so certain pages which are not versioned (or not yet versioned) have a single "FakeRevision" or "UnversionedRevision" or "WorkingRevision" or something like that. But the basic "page contains one or more revisions" abstraction holds true even for special pages or not-yet-created pages or preview pages. Then a "proper page" (bikeshed?) explicitly contains "one or more proper revisions" if you need to narrow types in a specific circumstance.

As discussed in T382341#10410127 (apologies for splitting the discussion!) transclusions can also include Special Pages and messages ("pages" in the MediaWiki namespace). So it would be convenient to have a concept of a page/revision that works consistently for these, and in fact there are many ways in which having a Content type for mediawiki messages would be very useful (for example, HTML/wikitext/plaintext messages could be distinguished) and a Content type for special pages would let us move some of the special purpose code in the legacy parser. (Note that messages *may* have an actual editable-page-revision in the MediaWiki namespace, or they may be "not overridden yet" in which case they have (default) contents but not a proper revision.)

@cscott I agree with all that, except for thinking of the thing being parsed as a revision. I did until recently, and I have now convinced myself that it was a mistake. Maybe you can convince me back ;)

We could also rename MCR to MCP (multi-content pages) or MCPI (multi-content "parser inputs") but without renaming a bunch of things having RevisionRenderer be responsible for rendering ParserInputs not Revisions seems a bit odd.

Yea... rendering a ParserInput should probably be the responsibility of ContentRenderer. RevisionRenderer would use it. Most code would then not use RevisionRenderer, but ContentRenderer.

But that would mean putting the transformation pipeline into ContentRenderer... maybe we need yet another service object? PageRenderer? It's a bit confusing...

Maybe RevisionRenderer and PageRenderer aer interfaces implemented by the same class?

Change #1105447 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Revision: make RevisionRecord::getPage() return a value object

https://gerrit.wikimedia.org/r/1105447

daniel set the point value for this task to 2.Dec 19 2024, 4:32 PM

Straight forward coding, but some conceptual subtleties.

Change #1105447 merged by jenkins-bot:

[mediawiki/core@master] Revision: make RevisionRecord::getPage() return a value object

https://gerrit.wikimedia.org/r/1105447

This has been live for a while now.