Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (520 w, 15 h)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Yesterday

dcausse moved T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch from Next Projects to elastic / cirrus on the Discovery-Search board.
Tue, May 27, 2:49 PM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse edited projects for T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch, added: Discovery-Search; removed Discovery-Search (2025.05.24 - 2025.06.13).

PR above got merged, moving back to the backlog, we'll continue working on this once we get closer to an opensearch version that has this feature

Tue, May 27, 2:49 PM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse moved T391792: Align search platform DAGs to DPE best practices from In Progress to Needs Review on the Discovery-Search (2025.05.24 - 2025.06.13) board.
Tue, May 27, 2:47 PM · Patch-For-Review, Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse added a comment to T347282: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions.

In this case, would you suggest making the HTTP call synchronous ? IIRC we tried this early on, but the interation between python, beam and Flink did lead to very high latencies even for very low throughput streams. I'd need to revisit.

Tue, May 27, 2:26 PM · Data-Engineering (Q4 2025 April 1st - June 30th), Event-Platform
dcausse added a comment to T394791: [SPIKE] Investigate CirrusSearch extension for Domain Event migrations.

It's unclear to me why this implements PageDeleteHook - that hook runs *before* deletion, the deletion may not even happen. Since the handler method does the same thing as the one for PageDeleteCompleteHook, it seems redundant. Perhaps it is the result of a misundersatnding.

From the code:

		// We use this to pick up redirects so we can update their targets.
		// Can't re-use PageDeleteComplete because the page info's
		// already gone
		// If we abort or fail deletion it's no big deal because this will
		// end up being a no-op when it executes.

There's the same hack in EventBus but I believe the new event system solves this issue by keeping track of the redirect target with \MediaWiki\Page\Event\PageDeletedEvent::wasRedirect() and \MediaWiki\Page\Event\PageDeletedEvent::getRedirectTargetBefore() and this will no longer be necessary.

Tue, May 27, 7:28 AM · MW-Interfaces-Team (MW-Sprint-10 (2025-05-20 to 2025-06-03)), OKR-Work, MediaWiki-DomainEvents
dcausse added a comment to T347282: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions.

Here's my understanding of the possible solutions:

  • use the keyed state: will probably have a huge impact on throughput & latency, a new batch will be created per key leading to most batches being rather small (1 event) and always fired by the timer
  • use the operator state: probably the most natural solution to keep the current logic, in-flight events will be re-played on restarts, issue is that CheckpointedFunction does not appear to be available with pyflink
  • use the AsyncIO operator, should be the preferred approach, this solutions provides delivery guarantees with no extra duplicates, unfortunately not available with pyflink
  • use the flink parallelism, instead of batching events we could achieve higher concurrency by simply setting the parallelism of the fetch operator (12 to match the default process_max_workers_default), not the best use of resources but probably acceptable for the expected throughput of the page_change stream
Tue, May 27, 7:05 AM · Data-Engineering (Q4 2025 April 1st - June 30th), Event-Platform

Mon, May 26

dcausse placed T363521: Completion suggester can promote a bad build up for grabs.
Mon, May 26, 3:21 PM · Discovery-Search (2025.05.24 - 2025.06.13), Sustainability (Incident Followup), CirrusSearch

Fri, May 23

dcausse added a comment to P76406 Search weighted tags from search index dumps.

Hi @dcausse ,

Thank you very much for sharing this. Works like a charm!
Looking into the data, we have article topic predictions with scores, and if the article has a link recommendation as a boolean.
This is awesome.
So, if we want to find the add-a-link recommendation scores, we should look in either maria db directly or another index.

Fri, May 23, 9:59 AM
dcausse added a comment to T388538: Migrate discovery-search jobs to mw-cron.

pinging @hoo & Wikidata for visibility on the work on mediawiki_job_wikidata-updateQueryServiceLag.timer

Should that job alert to Wikidata rather than Discovery-Search ?

Fri, May 23, 9:40 AM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata, Patch-For-Review, serviceops
dcausse added a project to T388538: Migrate discovery-search jobs to mw-cron: Wikidata.

pinging @hoo & Wikidata for visibility on the work on mediawiki_job_wikidata-updateQueryServiceLag.timer

Fri, May 23, 9:29 AM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata, Patch-For-Review, serviceops
dcausse created T395109: UpdateSuggesterIndex should fail early if the main indices do not exist.
Fri, May 23, 9:20 AM · Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse added a comment to T388538: Migrate discovery-search jobs to mw-cron.

It's when trying to run on s8, so wikidata, yes. I could also just remove s8 from the shards the script is running on?

Sure no need to run on s8 indeed.

Fri, May 23, 9:04 AM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata, Patch-For-Review, serviceops
dcausse added a comment to T388538: Migrate discovery-search jobs to mw-cron.

@Clement_Goubert thanks!

Fri, May 23, 8:17 AM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata, Patch-For-Review, serviceops

Thu, May 22

dcausse updated subscribers of P76406 Search weighted tags from search index dumps.
Thu, May 22, 2:00 PM
dcausse created P76406 Search weighted tags from search index dumps.
Thu, May 22, 1:38 PM

Mon, May 19

dcausse closed T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia as Resolved.

Seems to be working: searching for Яблочный пирог shows Рецепт:Яблочный_пирог in the sidebar.

Mon, May 19, 3:33 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites

Wed, May 14

dcausse moved T394274: InvalidArgumentException: Duplicate field labels for model wikibase-mediainfo from Incoming to Done on the Discovery-Search (2025.05.02 - 2025.05.23) board.
Wed, May 14, 10:21 AM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), Wikidata-Omega (Completed Tasks), Wikidata, Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch, Wikimedia-production-error
dcausse reassigned T394274: InvalidArgumentException: Duplicate field labels for model wikibase-mediainfo from dcausse to Lucas_Werkmeister_WMDE.
Wed, May 14, 8:55 AM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), Wikidata-Omega (Completed Tasks), Wikidata, Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch, Wikimedia-production-error
dcausse claimed T394274: InvalidArgumentException: Duplicate field labels for model wikibase-mediainfo.

might be related to T392058

Wed, May 14, 8:47 AM · MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), Wikidata-Omega (Completed Tasks), Wikidata, Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch, Wikimedia-production-error
dcausse moved T391876: Deepcategory search does not work with MediaSearch on commons from Needs Review to To be Deployed on the Discovery-Search (2025.05.02 - 2025.05.23) board.
Wed, May 14, 8:08 AM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), CirrusSearch, Commons
dcausse added a comment to F59944900: search proposed sankey shape.

Looks great!
I would find a bit more natural to have serp positioned before the target page

Wed, May 14, 7:46 AM

Tue, May 13

dcausse added a comment to T392409: 1.43 advance search extensions unable to search in title contain.

@Keewanlew some clarifications: intitle is searching for words in the titles:

  • intitle:s does not find the page named Some title
  • intitle:some can find a page named Some title
  • intitle:s can find a page named The letter S
Tue, May 13, 8:16 AM · Advanced-Search

Mon, May 12

dcausse assigned T393872: Make weighted tags no longer be WMF-specific to SD0001.
Mon, May 12, 3:43 PM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.45-notes (1.45.0-wmf.3; 2025-05-27), Patch-For-Review, CirrusSearch
dcausse added a comment to T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch.

PR uploaded to add support for matched_fields: https://github.com/opensearch-project/OpenSearch/pull/18166

Mon, May 12, 3:31 PM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse merged T356244: MediaSearch should display search warnings into T391876: Deepcategory search does not work with MediaSearch on commons.
Mon, May 12, 9:23 AM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.45-notes (1.45.0-wmf.2; 2025-05-20), CirrusSearch, Commons
dcausse merged task T356244: MediaSearch should display search warnings into T391876: Deepcategory search does not work with MediaSearch on commons.
Mon, May 12, 9:23 AM · Structured-Data-Backlog, MediaSearch

Fri, May 9

dcausse added a comment to T363521: Completion suggester can promote a bad build.

Batch id of the enwiki_titlewiki index in eqiad is 1746625724 (Wed May 07 2025 13:48:44) so this means the failure is possibly related to the incident or could just be a coincidence.

Fri, May 9, 4:49 PM · Discovery-Search (2025.05.24 - 2025.06.13), Sustainability (Incident Followup), CirrusSearch

Thu, May 8

dcausse updated the task description for T386098: Run a full data-reload on wdqs-main, wdqs-scholarly and wdqs to capture new blank node labels.
Thu, May 8, 9:23 PM · Data-Platform-SRE (2025.05.24 - 2025.06.13), Wikidata, Wikidata-Query-Service
dcausse moved T393713: Regularly reconcile items with delete blank nodes from Incoming to Blocked / Waiting on the Discovery-Search (2025.05.02 - 2025.05.23) board.

I've added a quick cronjob running from stat1009:/home/dcausse/wdqs_reconcile/reconcile.sh running daily at 10:00 UTC and will reconcile all items edited the previous day that have a change in a SomeValue node.
Moving to waiting to not forgot to stop that job once the reload is done.

Thu, May 8, 5:09 PM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata
dcausse created T393713: Regularly reconcile items with delete blank nodes.
Thu, May 8, 2:05 PM · Discovery-Search (2025.05.24 - 2025.06.13), Wikidata
dcausse added a comment to T363521: Completion suggester can promote a bad build.

Ran the script from Erik and found:

hewikisource 3014
fiwiktionary 1151
trwiktionary 2914
zhwiktionary 1754
mgwiktionary 1859
enwiktionary 1556
enwiki 5335305
Thu, May 8, 10:20 AM · Discovery-Search (2025.05.24 - 2025.06.13), Sustainability (Incident Followup), CirrusSearch
dcausse reopened T363521: Completion suggester can promote a bad build, a subtask of T363694: Post incident tasks: Search missing results/unavailable for some eqiad users, as Open.
Thu, May 8, 10:15 AM · Data-Platform-SRE (2024.05.06 - 2024.05.26), Discovery-Search (Current work), Sustainability (Incident Followup), SRE-OnFire
dcausse reopened T363521: Completion suggester can promote a bad build as "Open".

Re-opening, we seem to have promoted a bad build recently causing T393663. Unfortunately we disabled completion index rebuilds as part of the opensearch migration and the bad index kept serving stale results for quite some time.
Reason for the bad promotion is quite unclear, sole trace I could find is https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2025.05.07?id=OJUeq5YBfOjk-Vo1yy77 but this error suggests that the build failed and should not have promoted the index. It's possible the bad index was promoted on the previous run on May 6 but not finding anything about this yet.

Thu, May 8, 10:15 AM · Discovery-Search (2025.05.24 - 2025.06.13), Sustainability (Incident Followup), CirrusSearch
dcausse added a comment to T386098: Run a full data-reload on wdqs-main, wdqs-scholarly and wdqs to capture new blank node labels.

Quick heads up that wdqs users are starting to get impacted by this.

Thu, May 8, 9:35 AM · Data-Platform-SRE (2025.05.24 - 2025.06.13), Wikidata, Wikidata-Query-Service
dcausse closed T393635: Dated elections no longer top results in search preview on en.wikipedia as Resolved.

This should now be resolved, please see T393663#10803425

Thu, May 8, 8:37 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393662: Wikipedia lacks some search results in search suggestions as Resolved.
Thu, May 8, 8:35 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse added a comment to T393662: Wikipedia lacks some search results in search suggestions.

This should now be resolved, please see T393663#10803425

Thu, May 8, 8:35 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393660: Adding a link to another article does not suggest articles whose names are similar to the text being linked but instead suggests unrelated topics as Resolved.

This should now be resolved, please see T393663#10803425

Thu, May 8, 8:34 AM · VisualEditor
dcausse edited projects for T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), added: CirrusSearch; removed WMF-General-or-Unknown.
Thu, May 8, 8:26 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), a subtask of T393635: Dated elections no longer top results in search preview on en.wikipedia, as Resolved.
Thu, May 8, 8:26 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), a subtask of T393660: Adding a link to another article does not suggest articles whose names are similar to the text being linked but instead suggests unrelated topics, as Resolved.
Thu, May 8, 8:26 AM · VisualEditor
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), a subtask of T393662: Wikipedia lacks some search results in search suggestions, as Resolved.
Thu, May 8, 8:26 AM · Discovery-Search (2025.05.02 - 2025.05.23), CirrusSearch
dcausse closed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions) as Resolved.

Apologies about this, as part of T388610: Migrate production Elastic clusters to Opensearch (CirrusSearch backend infrastructure) we disabled some updates, some transition took longer than we expected. I routed the search traffic to codfw which should have fresh indices. The example query mentioned in the description now returns results.

Thu, May 8, 8:26 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)
dcausse edited projects for T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions), added: Discovery-Search (2025.05.02 - 2025.05.23); removed Discovery-Search.
Thu, May 8, 8:21 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)
dcausse claimed T393663: Many pages do not appear in typeahead search results (autocomplete / search suggestions).

Very likely due to the opensearch migration

Thu, May 8, 7:25 AM · CirrusSearch, Discovery-Search (2025.05.02 - 2025.05.23)

Wed, May 7

dcausse claimed T391792: Align search platform DAGs to DPE best practices.
Wed, May 7, 2:06 PM · Patch-For-Review, Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch
dcausse moved T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia from In Progress to Needs Review on the Discovery-Search (2025.05.02 - 2025.05.23) board.
Wed, May 7, 10:25 AM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites
dcausse added a comment to T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia.

I think it's a reasonable expectation that when you search for the default namespaces you want the default namespaces to be searched on the sister wikis as well.

Wed, May 7, 10:24 AM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites

Tue, May 6

dcausse claimed T385841: Make Recipe namespace in Russian Wikibooks shown in search results in Wikipedia.
Tue, May 6, 1:17 PM · MW-1.45-notes (1.45.0-wmf.1; 2025-05-13), Discovery-Search (2025.05.02 - 2025.05.23), MediaWiki-Search, Wikimedia-Site-requests, Russian-Sites
dcausse added a comment to T393392: Reindex Czech-language wikis to enable diacritic folding.

just a quick heads in case you planned to re-index commons/wikidata as part of this task, please skip these 2 indices til T392058 is fixed.

Tue, May 6, 12:16 PM · Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch

Apr 24 2025

dcausse claimed T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch.
Apr 24 2025, 9:03 AM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse edited projects for T390262: Add support for the unified highlighter and consider using it by default in CirrusSearch, added: Discovery-Search (2025.04.11 - 2025.05.02); removed Discovery-Search.
Apr 24 2025, 9:03 AM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.28; 2025-05-06), CirrusSearch
dcausse closed T391090: TypeError: array_flip(): Argument #1 ($array) must be of type array, null given, a subtask of T374702: Cleanup: Remove deprecated weighted tag methods, as Resolved.
Apr 24 2025, 9:02 AM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch
dcausse closed T391090: TypeError: array_flip(): Argument #1 ($array) must be of type array, null given as Resolved.
Apr 24 2025, 9:02 AM · Discovery-Search (2025.05.02 - 2025.05.23), MW-1.44-notes (1.44.0-wmf.25; 2025-04-15), CirrusSearch

Apr 23 2025

dcausse added a comment to T271776: Allow limiting lexeme searches by language.

Why was it too ambiguous? The idea was to match the existing haslabel, hasdescription and hascaption keywords (https://www.mediawiki.org/wiki/Help:Extension:WikibaseCirrusSearch#haslabel/hascaption) - lemmas are effectively labels for lexemes, so it makes sense for the lemma keywords to be similar to the label keywords.

Apr 23 2025, 1:15 PM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Patch-For-Review, OKR-Work, CirrusSearch, Wikidata, Wikidata Lexicographical data
dcausse added a comment to T391383: Metrics for federated querying.

Does some kind of similar logging/tracking already exist in Query Service? What information does it contain?

Apr 23 2025, 8:09 AM · Wikidata, Wikidata-Query-Service

Apr 18 2025

dcausse added a comment to T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).

Also happens on action=parse

Apr 18 2025, 8:31 AM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error

Apr 17 2025

dcausse closed T388549: Vector Search PoC as Resolved.

@gmodena thanks for working on this!

Apr 17 2025, 4:29 PM · Discovery-Search (2025.04.11 - 2025.05.02)
dcausse added a comment to T271776: Allow limiting lexeme searches by language.

@Nikki (or anyone else interested in filtering on lemma spelling variants) while working on this we realized that some clarifications might be needed.
The new search keyword we will add is currently named lemmaspellingvariant, it's not ideal because quite long but I found that haslemma was too ambiguous (please let us know if you have objections/suggestions).
The use of this keyword will be like other keywords and quite independent from the rest of the search query, for instance: aluminium lemmaspellingvariant:en-us will find https://www.wikidata.org/wiki/Lexeme:L18179. From the ticket description I think this is what is expected but if not please let us know. Allowing to match a particular lemma string against its specific language variant will require some thinking on our side and is not entirely trivial.

Apr 17 2025, 3:32 PM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Patch-For-Review, OKR-Work, CirrusSearch, Wikidata, Wikidata Lexicographical data
dcausse added a subtask for T372912: Migrate image recommendation to use page_weighted_tags_changed stream: T389643: [L] Adapt or transform image_suggestions_search_index_delta to allow creating one update per article.
Apr 17 2025, 2:57 PM · Data-Platform-SRE (2025.05.24 - 2025.06.13), Discovery-Search (2025.05.24 - 2025.06.13), Data-Engineering-Radar, Structured Data Engineering, Structured-Data-Backlog, Data-Engineering, CirrusSearch
dcausse added a parent task for T389643: [L] Adapt or transform image_suggestions_search_index_delta to allow creating one update per article: T372912: Migrate image recommendation to use page_weighted_tags_changed stream.
Apr 17 2025, 2:56 PM · Discovery-Search (2025.05.24 - 2025.06.13), Structured-Data-Backlog, CirrusSearch, Structured Data Engineering, Image-Suggestions
dcausse added a comment to T389643: [L] Adapt or transform image_suggestions_search_index_delta to allow creating one update per article.

I think this task should be done as part of T372912 which will involve some refactoring of the way the tags are shipped.
I suspect that the delta you generate could easily be grouped by page_id after they're computed.

Apr 17 2025, 2:56 PM · Discovery-Search (2025.05.24 - 2025.06.13), Structured-Data-Backlog, CirrusSearch, Structured Data Engineering, Image-Suggestions
dcausse changed Request URL from https://ca.wikipedia.org/w/api.php?action=query&format=*&cbbuilders=*&prop=*&formatversion=*&pageids=* to /w/index.php?action=edit&title=*&undo=*&undoafter=* on T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).
Apr 17 2025, 2:09 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse placed T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema) up for grabs.

Cirrus should now gracefully handle this exception, I took a quick look at EditPage but I'm not quite clear how to fail gracefully there.

Apr 17 2025, 2:07 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse updated the task description for T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).
Apr 17 2025, 2:05 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse closed T390853: Consider using upgradeMode=savepoint for the cirrus-streaming-updater as Resolved.
Apr 17 2025, 12:51 PM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch

Apr 16 2025

dcausse edited projects for T258278: Advanced search not working as expected with subpages in namespaces, added: Advanced-Search; removed CirrusSearch, Discovery-Search.

Tagging Advanced-Search because we introduced subpageof in CirrusSearch to workaround confusing behaviors of existing keywords like prefix:, see T159321 and T180495. There's possibly something to do on the UI to better guide the user using this field?
Search keywords that can escape the initial namespace filter have been quite confusing as well and we decided to not introduce new ones because there's no way to understand from the CirrusSearch perspective what's the actual intent of the user.

Apr 16 2025, 7:35 AM · Advanced-Search

Apr 15 2025

dcausse moved T390853: Consider using upgradeMode=savepoint for the cirrus-streaming-updater from In Progress to Needs Review on the Discovery-Search (2025.04.11 - 2025.05.02) board.
Apr 15 2025, 12:57 PM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch
dcausse claimed T390853: Consider using upgradeMode=savepoint for the cirrus-streaming-updater.
Apr 15 2025, 12:51 PM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch
dcausse moved T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema) from In Progress to Needs Review on the Discovery-Search (2025.04.11 - 2025.05.02) board.
Apr 15 2025, 12:50 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse closed T390665: wdqs2016 and 2017 not consuming updates as Resolved.
Apr 15 2025, 10:21 AM · Discovery-Search (2025.04.11 - 2025.05.02), Wikidata, Wikidata-Query-Service
dcausse closed T326311: Deletion of Lexemes appears to leak triples related to its forms and senses as Resolved.
Apr 15 2025, 10:01 AM · Discovery-Search (2025.04.11 - 2025.05.02), Wikidata
dcausse claimed T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema).
Apr 15 2025, 9:51 AM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error
dcausse moved T221709: scap service restarts for WDQS are inconsistent from To be Deployed to Done on the Discovery-Search (2025.04.11 - 2025.05.02) board.
Apr 15 2025, 8:53 AM · Discovery-Search (2025.04.11 - 2025.05.02), Data-Platform-SRE (2025.04.12 - 2025.05.02), Wikidata, Scap, Wikidata-Query-Service
dcausse closed T221709: scap service restarts for WDQS are inconsistent as Resolved.

tested wdqs & wcqs deploys and all the expected services got restarted successfully.

Apr 15 2025, 8:52 AM · Discovery-Search (2025.04.11 - 2025.05.02), Data-Platform-SRE (2025.04.12 - 2025.05.02), Wikidata, Scap, Wikidata-Query-Service

Apr 14 2025

dcausse closed T270106: Port query clicks datasets generation to airflow as Invalid.

already done

Apr 14 2025, 1:35 PM · Discovery-Search (2025.05.02 - 2025.05.23)
dcausse closed T355156: Upgrade the Flink version used by the Search Update Pipeline to fix bulk request size estimation issue as Declined.

we now use a custom elasticsearch sink, I doubt we'll want to go back to the one provided by flink

Apr 14 2025, 1:31 PM · Discovery-Search (2025.05.02 - 2025.05.23)
dcausse closed T311183: ores_predictions_daily DAG fails to overwrite files as Declined.

fetch_articletopic_prediction_thresholds has been removed

Apr 14 2025, 1:26 PM · Discovery-Search (2025.05.02 - 2025.05.23)
dcausse moved T383074: The CirrusSearch Saneitizer should support weighted_tags from elastic / cirrus to needs triage on the Discovery-Search board.
Apr 14 2025, 12:23 PM · Discovery-Search, CirrusSearch
dcausse claimed T391090: TypeError: array_flip(): Argument #1 ($array) must be of type array, null given.
Apr 14 2025, 8:54 AM · Discovery-Search (2025.05.02 - 2025.05.23), MW-1.44-notes (1.44.0-wmf.25; 2025-04-15), CirrusSearch
dcausse created T391792: Align search platform DAGs to DPE best practices.
Apr 14 2025, 7:48 AM · Patch-For-Review, Discovery-Search (2025.05.24 - 2025.06.13), CirrusSearch

Apr 11 2025

dcausse added a comment to T388549: Vector Search PoC.

I've been playing with grouping top results by cluster, this is at http://localhost:12222/clustered. Could be interesting in the context of diversity search.

Apr 11 2025, 5:41 PM · Discovery-Search (2025.04.11 - 2025.05.02)
dcausse added a comment to T388549: Vector Search PoC.

Wrote a small demo available on stat1009, you need a tunnel there with ssh -L12222:localhost:12222 stat1009.eqiad.wmnet and then open http://localhost:12222/.

Apr 11 2025, 2:04 PM · Discovery-Search (2025.04.11 - 2025.05.02)
dcausse closed T389429: Investigate whether it’s intentional / correct that default CirrusSearch setups run cirrusSearchElasticaWrite as separate jobs as Resolved.
Apr 11 2025, 1:41 PM · Discovery-Search (2025.04.11 - 2025.05.02), MW-1.44-notes (1.44.0-wmf.23; 2025-04-01), CirrusSearch
dcausse closed T391232: drop_mjolnir_partitions is broken as Resolved.
Apr 11 2025, 1:39 PM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch

Apr 10 2025

dcausse added a project to T382904: MediaWiki\Revision\BadRevisionException: The content of this revision is missing or corrupted (bad schema): CirrusSearch.

Tagging CirrusSearch, we should probably handle this exception in CirrusSearch to minimize the log spam, we'll probably mark these pages as "broken" in the search index with an artificial template like CirrusSearchBadRevision.

Apr 10 2025, 5:11 PM · MediaWiki-Engineering, MediaWiki-Core-Revision-backend, Wikimedia-production-error

Apr 8 2025

dcausse added a comment to T388549: Vector Search PoC.

@gmodena thanks!

Apr 8 2025, 1:43 PM · Discovery-Search (2025.04.11 - 2025.05.02)
dcausse closed T378382: Update cirrus-reindex-orchestrator for mwscript-on-k8s as Resolved.
Apr 8 2025, 1:42 PM · CirrusSearch, Discovery-Search (2025.03.01 - 2025.03.21), MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), Patch-For-Review
dcausse closed T378382: Update cirrus-reindex-orchestrator for mwscript-on-k8s, a subtask of T376427: Update cirrus for mwscript-on-k8s, as Resolved.
Apr 8 2025, 1:42 PM · Discovery-Search (2025.03.22 - 2025.04.11)
dcausse added a project to T378382: Update cirrus-reindex-orchestrator for mwscript-on-k8s: CirrusSearch.
Apr 8 2025, 1:42 PM · CirrusSearch, Discovery-Search (2025.03.01 - 2025.03.21), MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), Patch-For-Review
dcausse added a comment to T384514: Reduce Commons' search index delta size.

Additionally if we're confident that we'll always have less than 500k tags/week we might consider unblocking T372912: Migrate image recommendation to use page_weighted_tags_changed stream

Apr 8 2025, 9:41 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
dcausse added a comment to T384514: Reduce Commons' search index delta size.

@mfossati thanks! Do these two snapshots use the same dumps? if yes we might perhaps wait for a run that uses different dump and see?

Apr 8 2025, 9:38 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
dcausse placed T391097: PHP Warning: foreach() argument must be of type array|object, float given up for grabs.

I think the problem is the param statement:boost= whose intent is I believe to disable statement boosting but this is setting a float where a map of properties -> weight is expected.
Removing CirrusSearch since it relates to param overrides in MediaSearch extensions/WikibaseMediaInfo/src/Search/MediaSearchProfiles.php:

	foreach ( RequestContext::getMain()->getRequest()->getQueryValues() as $key => $value ) {
		// convert [ 'one:two' => 'three' ] into ['one']['two'] = 'three'
		$flat = array_merge( explode( ':', $key ), [ floatval( $value ) ] );
		$result = array_reduce(
			array_reverse( $flat ),
			static function ( $previous, $key ) {
				return $previous !== null ? [ $key => $previous ] : $key;
			},
			null
		);
		$settings = array_replace_recursive( $settings, $result );
	}

Where I think the target setting must be appropriately checked to make sure that a float is not put in place of an array.

Apr 8 2025, 9:35 AM · MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Structured-Data-Backlog (Current Work), WikibaseMediaInfo, Wikimedia-production-error
dcausse claimed T391097: PHP Warning: foreach() argument must be of type array|object, float given.
Apr 8 2025, 8:20 AM · MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Structured-Data-Backlog (Current Work), WikibaseMediaInfo, Wikimedia-production-error
dcausse moved T391232: drop_mjolnir_partitions is broken from In Progress to Needs Review on the Discovery-Search (2025.03.22 - 2025.04.11) board.
Apr 8 2025, 8:10 AM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch
dcausse claimed T391232: drop_mjolnir_partitions is broken.
Apr 8 2025, 7:38 AM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch
dcausse moved T271776: Allow limiting lexeme searches by language from In Progress to Blocked / Waiting on the Discovery-Search (2025.03.22 - 2025.04.11) board.
Apr 8 2025, 7:37 AM · Discovery-Search (2025.05.24 - 2025.06.13), MW-1.44-notes (1.44.0-wmf.27; 2025-04-29), Patch-For-Review, OKR-Work, CirrusSearch, Wikidata, Wikidata Lexicographical data

Apr 7 2025

dcausse added a comment to T390862: Kafka dependency upgrade in spicerack.

@elukey sorry I missed your initial ping, reading the changelog I don't foresee any problems on the wdqs cookbooks, do you need any help for the version bump?

Apr 7 2025, 3:00 PM · Infrastructure-Foundations
dcausse created T391232: drop_mjolnir_partitions is broken.
Apr 7 2025, 8:22 AM · Discovery-Search (2025.04.11 - 2025.05.02), CirrusSearch
dcausse closed T391122: Some wikidata edits not being reflected on WDQS as Resolved.

The items have been reconciled, the root cause is unfortunately not fixed and these issues might happen again if we don't get to an agreement on how to handle missing events or make the event platform more resilient.

Apr 7 2025, 7:59 AM · Discovery-Search (2025.03.22 - 2025.04.11), Wikidata, Wikidata-Query-Service