Modernize the parser onto php-parser 5 + reflection-docblock 6 (PHP 8.2)#1
Open
bordoni wants to merge 20 commits into
Open
Modernize the parser onto php-parser 5 + reflection-docblock 6 (PHP 8.2)#1bordoni wants to merge 20 commits into
bordoni wants to merge 20 commits into
Conversation
- Add bin/generate-golden.php to snapshot parser output as JSON - Add WordPress-free PHPUnit golden test + standalone bootstrap/config - Share corpus discovery and normalization via tests/golden/golden.php - Cover all tests/source/*.php and tests/**/*.inc fixtures (16 entries) Pins parse_files() output key-for-key so the parser can be rewritten onto the modern stack without silently changing the importer's contract.
- Add 16 JSON snapshots of the old parser's parse_files() output - Add bin/generate-golden-docker.sh to reproduce them on php:7.4-cli - Document the validated PHP 7.4 generation recipe The old stack emits warnings on PHP 8.x, so the baseline is generated on 7.4 and frozen as the regression oracle for the rewrite.
- Update yoast/phpunit-polyfills ^1.0 -> ^1.1 (1.0.3 => 1.1.5) - Required by the modern WordPress PHPUnit suite (needs Polyfills >= 1.1.0) Fixes the 'Version mismatch detected for the PHPUnit Polyfills' fatal so the export + import suite runs green. Parser deps left untouched. Addresses WordPress#244.
- How to run the export+import suite via wp-env (npm scripts or npx) - Record the baseline: 22 tests / 125 assertions on PHP 7.4 - Note the GitHub HTTPS->SSH git-rewrite gotcha and the per-process fix
- Drop phpdocumentor/reflection ~3.0; add nikic/php-parser ^5, phpstan/phpdoc-parser ^2, reflection-docblock ^6, type-resolver ^2 - Bump php to >=8.2, phpunit to ^9, polyfills to ^2; pin resolve platform to php 8.2 for reproducible locks - .wp-env.json to PHP 8.2; CI matrix to 8.2/8.3/8.4; phpunit.xml.dist to the v9 schema (coverage/include) - Guard parser-dependent tests to skip (not fatal) while File_Reflector is rewritten in Stage 4 The old parser is intentionally non-functional until Stage 4; both suites report skips. The PHP 7.4 golden snapshots are unchanged.
- Pretty_Printer now extends PhpParser\PrettyPrinter\Standard; prettyPrintArg mirrors prettyPrintExpr state handling (resetState + handleMagicTokens), replacing the removed v1 noIndentToken stripping - Add WordPress-free unit suite (tests/unit + phpunit-unit.xml.dist) covering the pretty printer and NameResolver(replaceNodes:false) foundation - Unit tests assert the exact hook-name/arg strings frozen in the golden snapshots, proving correctness before File_Reflector is wired up Stage 3 of the parser modernization. 5 tests / 9 assertions green on PHP 8.2.
- Replace the phpDocumentor FileReflector subclass with a NodeVisitorAbstract that walks the AST and collects functions, classes, methods, properties and arguments into thin reflector wrappers (lib/class-reflectors.php) - Re-expose only the reflector API runner.php consumes, keeping the exported array shape identical; collect on leaveNode so nested functions order inner-first like the legacy parser; resolve extends/implements to \FQN - DocBlocks return null and hooks/$uses are deferred (Stages 5-6) - Add File_Reflector unit tests; structure matches the golden oracle 16/16 (full golden match pending docblocks + hooks) Stage 4 of the parser modernization. Unit suite: 8 tests / 33 assertions.
- Add lib/class-docblock-adapter.php wrapping reflection-docblock 6 to emit the
legacy {description, long_description, tags[]} shape; distinct tag adapters so
export_docblock's method_exists() probes select the right keys
- type_to_legacy_strings() maps type-resolver 2 types to string[] (\WP_Post,
int[], unions); long_description reproduced via Parsedown block parsing
- Reconstruct @see (InvalidTag) and @link to match the loose legacy parsing
- Wire getDocBlock() through every reflector; detect the file-level docblock
(first-docblock heuristic incl. the open-tag-adjacency quirk)
- Add docblock adapter unit tests
Stage 5. Golden: 7/16 full, 16/16 structure+docblocks (hooks/uses pending).
Unit suite: 11 tests / 50 assertions.
- Rewrite the four call/hook reflectors onto php-parser 5 (Function_Call, Method_Call, Static_Method_Call, Hook): restore the WP-globals class map, self/parent/$this resolution, the full hook-type switch, name cleanup, arg shift - File_Reflector records hooks (do_action/apply_filters + variants) and per-element $uses via node attributes (not dynamic props), with the last_doc carry-over for undocumented hooks and per-method called-in-class assignment - Add a Class_Name_Resolver pass that fully-qualifies class-position names so a nested Class::m() caller prints as \Class::m() while function names stay unqualified, matching the legacy php-parser 1 output - Docblock adapter: reconstruct @param/@var InvalidTags (e.g. $this) with type resolution via the docblock context - PHPUnit 9: assertInternalType -> assertIsArray in export-testcase Stage 6 - parser rewrite complete. Golden 16/16, WP 22/22, unit 11/50 all green.
- Remove the obsolete 'use phpDocumentor\Reflection\*' imports from runner.php (those reflectors are gone); update stale PHPDoc type hints to the new wrappers - Remove the now-permanent parser_is_functional() skip guards from the golden and WP suites; the parser is always loadable, so the tests always run Stage 7 cleanup. Golden 16/16, WP 22/22, unit 11/50 still green.
Found by an end-to-end 'wp parser export' of the full WordPress wp-includes
(1043 files): 'new class extends Foo {}' reached the string-cast fallback in
Method_Call_Reflector and fataled. Return '' for a nameless class instead.
Add a regression unit test. Full wp-includes now exports cleanly: 3409
functions, 690 classes, 6247 methods, 2427 hooks, zero fatals.
The plugin bundles scribu/lib-posts-to-posts + scb-framework via Composer, but .wp-env.json also installed the standalone posts-to-posts plugin. Both load the scribu framework, so 'wp-env start' fataled during plugin activation (Cannot redeclare scb_init() / class P2P_Storage not found), failing CI setup before any test ran. Drop the standalone plugin; the bundled copy provides P2P_Storage (verified: phpdoc-parser activates clean, WP 22/22 + golden 16/16 + unit 12/12).
CI runs 'wp-env start' (which activates the plugin) before composer installs vendor/, so P2P_Storage isn't loaded at activation -> 'class P2P_Storage not found' fatal that failed CI setup. Wrap the activation callbacks in a class_exists() check; Relationships already creates the P2P tables on demand, so this is safe. Verified: plugin activates clean with vendor absent.
- Fully-qualify class-name argument typehints (\WP_Post), leaving built-in Identifier types (int, string, array) bare. Argument types previously dropped the leading backslash the legacy parser emitted. - Strip reflection-docblock's FQSEN normalization backslash from @see references so they read as written, matching the legacy output. Both are the leading-backslash discrepancy dd32 flagged on upstream PR WordPress#247 (present in our parser too); verified against the legacy oracle.
- Add type-tags.inc fixture + legacy golden: the WordPress @type hash @param stays inline in content and is not extracted, per dd32 and johnbillion (the wporg-developer theme depends on this). - Extend docblocks.inc (additive) with @see references, a typed-hash method, and a markdown-heavy description; regenerate the legacy oracle. - Unit tests lock the modern @param syntaxes the old parser mangled (?type, parenthesized unions) and modern code typehints (?WP_Post, union, return types) that php-parser v1 could not parse at all.
- Extract file-level constants (define() calls anywhere + the const
keyword) and include/require statements with their legacy type labels
("Include", "Require Once", ...), via new Include_Reflector and
Constant_Reflector wrappers. File_Reflector previously returned [] for
both, silently dropping these from the export contract.
- Fix the file-docblock heuristic: a docblock attached to the open tag
floats to the file only when the first statement does not claim it.
Hooks, define(), and include/require claim it (as the legacy parser
does); only plain calls and assignments leave it for the file. The old
check missed bare hooks/define()/require, mis-attributing their
docblock to the file on real wp-load.php-style files.
- Lock both with a golden fixture (constants-includes.inc, minted from
the legacy parser on PHP 7.4) and unit tests.
Found while mining upstream PR WordPress#247 for export-contract gaps.
Mining upstream PR WordPress#247: probed two contract corners our corpus never exercised; both already match the legacy parser byte-for-byte. - hooks-extra: all 6 hook variants (action/filter x ref_array/deprecated), argument shapes, and hook-name normalization (concat -> interpolation). - class-features: abstract/final, extends + multiple implements, method aliases, multi-property declarations, and trait/interface exclusion.
…parser Mining upstream PR WordPress#247 surfaced four name-resolution gaps in namespaced code (which in wp-includes means the bundled SimplePie/Requests/PHPMailer libraries dd32 diffed): - Method namespace now reports its enclosing namespace (My\Plugin), not '' — '' is correct only for the global namespace. - Exported namespace aliases on functions and methods are fully-qualified ("\Other\Thing"), matching the legacy output (dd32's leading-backslash note). - Function-use names resolve like the legacy parser: unqualified global- fallback calls stay bare (count), while fully-qualified, qualified, and use-function-imported calls become a leading-backslash FQN (\do_action, \Other\helper, \My\Plugin\Sub\thing). The previous code read the wrong resolver attribute and left them all bare. - A fully-qualified \do_action is a plain function call, not a hook. Locked with a golden fixture (namespaced-uses.inc, minted from the legacy parser on PHP 7.4) and a unit test.
Mining upstream PR WordPress#247: calls inside closures (anonymous functions used as hook callbacks, assigned to variables, or nested in functions) attribute to the enclosing named scope (file or function), not the closure — already matching the legacy parser byte-for-byte.
- Add @var and method docblocks to Include_Reflector and Constant_Reflector. - Document the method namespace rule (global reported as '') and the fully-qualified alias export on the function/method accessors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites the PHPDoc parser off the abandoned
phpdocumentor/reflection ~3.0(php-parser v1) stack — which can't parse modern PHP and has left developer.wordpress.org's code reference frozen since Aug 2023 — onto a modern, maintained one:phpdocumentor/reflectionis removed entirely.Approach
Driven by a golden-master oracle: the old parser's output was snapshotted on PHP 7.4, then the rewrite reproduces it byte-for-byte, so the array
parse_files()hands the importer is unchanged. The WordPress import integration suite then proves the resulting posts/meta/taxonomies are identical end-to-end.Staged: deps/env bump → pretty printer →
File_Reflector(structure) → docblock adapter → hooks + uses → cleanup. Each stage was verified against the oracle before moving on.Testing
All green on PHP 8.2:
File_ReflectorReal-world:
wp parser exportover the full WordPresswp-includes(1,043 files) runs clean — 3,409 functions, 690 classes, 6,247 methods, 2,427 hooks, zero fatals. (The old parser fataled on modern syntax such as nullable types and enums.)Notes
runner.phponly readsgetClasses()); can be added later if DevHub wants it.scribu/lib-posts-to-postsemits dynamic-property deprecations on PHP 8.2 (unmaintained dependency; out of scope here).AI disclosure
These changes were generated primarily by an AI coding assistant (Claude Code), working stage by stage under human direction. Following the spirit of WordPress#247's disclosure: the code is AI-authored and reviewers should scrutinise the diff accordingly.
What distinguishes it from a raw AI rewrite is the verification bar it was held to — every stage had to reproduce the old parser's output byte-for-byte against a golden-master oracle, and the WordPress import suite, unit suite, and a full
wp-includesexport all pass (see Testing). That catches behavioural drift the way a human review of 4,000 lines of generated code realistically cannot, but it does not replace human judgement on architecture and edge cases.Relates to WordPress#247 — several legacy-parity fixes here (FQN forms, file constants/includes, file-docblock attribution, and namespaced name resolution) were surfaced while reviewing that PR's approach and the maintainers' feedback on it.