Add conformance tests for SEP-2243 HTTP Standardization#259
Conversation
commit: |
|
Notes on how this was tested: I ran the tests in this branch the support for SEP-2243 added in this branch of the MCP C# SDK and all tests pass. Here's how I ran the tests: # In the MCP C# SDK repo, checkout the branch
git checkout mdk/sep-2243-implementation
# Remove any existing node_modules to get to a known state
rm -rf node_modules
# Install node modules with the exact versions in package lock
npm ci
# Now replace the conformance node module with the tests in this branch
npm install --no-save "@modelcontextprotocol/conformance@github:mikekistler/conformance#mdk/sep-2243-conformance-tests"
# Run the server conformance tests (.NET 10 only)
MCP_CONFORMANCE_PROTOCOL_VERSION=DRAFT-2026-v1 dotnet test tests/ModelContextProtocol.AspNetCore.Tests/ModelContextProtocol.AspNetCore.Tests.csproj -f net10.0 --filter "FullyQualifiedName~ServerConformanceTests"
# Run the client conformanct tests (.NET 10 only)
MCP_CONFORMANCE_PROTOCOL_VERSION=DRAFT-2026-v1 dotnet test tests/ModelContextProtocol.AspNetCore.Tests/ModelContextProtocol.AspNetCore.Tests.csproj -f net10.0 --filter "FullyQualifiedName~ClientConformanceTests"Here's the server conformance test output I got: And here's the output from the client conformance tests: |
Close gaps in HTTP header conformance scenarios: Client standard headers (http-standard-headers.ts): - Enforce all expected methods in checks, failing any that were not observed - Track Mcp-Name header per-method separately from Mcp-Method - Advertise resources and prompts capabilities so clients exercise those endpoints - Add a prompt entry for prompts/get testing Client custom headers (http-custom-headers.ts): - Add Base64 encoding checks for non-ASCII, whitespace, and control-char values - Add null/omitted parameter test (second tool call with null value) - Add client-keeps-valid-tool check verifying clients still call valid tools after filtering out invalid ones Server header validation (server/http-standard-headers.ts): - Replace fetch-based sendRawRequest with http.request to preserve exact header casing on the wire (fetch/Headers lowercases names) - Compute defaultArgs and defaultHeaders from tool schema so test requests satisfy all required parameters while varying only the param under test Traceability (sep-2243.yaml): - Add 7 new spec-to-check mappings (nonascii, whitespace, controlchar, keeps-valid-tool, literal-missing-base64-prefix/suffix, no-mirror-unannotated) - Fix 2 incorrect existing mappings
Add missing test cases identified by comparing the branch against the SEP-2243 spec's Conformance Test Cases section: - Mcp-Method on notifications: add notifications/initialized to expectedMethods so clients are checked for header on notifications - Boolean true: add debug=true param to complement verbose=false - Leading-space-only and trailing-space-only: separate checks for Base64 encoding when only one edge has whitespace - Internal spaces only: verify plain ASCII (no Base64) for values like 'us west 1' that have spaces only in the middle - CRLF string: add line1\r\nline2 test (complements existing \n test) - Leading tab: add \tindented test for tab-triggered Base64 encoding - Server rejects invalid Mcp-Param chars: mark as excluded in YAML since HTTP itself prevents these characters in headers Register all new checks in sep-2243.yaml.
The spec states header values are case-sensitive (RFC 9110). The =?base64? prefix is part of the header value, not the header name, so it should be matched case-sensitively. This was identified as a bug in the conformance tests per feedback on SEP-2243. Changes: - Remove server-accepts-case-insensitive-base64 check from server validation scenario - Change base64 prefix regex from case-insensitive (/i) to case-sensitive in client header validation - Remove corresponding entry from sep-2243.yaml
The spec constrains x-mcp-header to primitive types but does not restrict it to top-level properties only. A nested string property with x-mcp-header is valid per the spec. This was confirmed by mikekistler in SEP-2243 PR discussion. Removes invalid_nested_header from the invalid tool definitions list and its corresponding sep-2243.yaml entry.
Compare number header values numerically instead of as strings to accommodate cross-SDK floating point representation differences (e.g., '42' vs '42.0'). For integers, exact numeric match is required. For decimals, a tolerance of 1e-9 is allowed. See SEP-2243 discussion on number precision.
Suggested PR description updateHere's an updated description reflecting the changes made based on SDK implementer feedback: Changes based on SDK implementer feedbackThe following changes were made after reviewing feedback from SDK implementers on SEP-2243 PR #2243:
Items requiring SEP-2243 spec updatesThe following inconsistencies were found between the SEP conformance test table and the normative spec text. These need to be corrected in the SEP:
|
|
@mikekistler I reviewed this and ended up being easier to write as a diff, here's the PR version: mikekistler#1 if you're happy with those changes, feel free to merge it in and we can close, or we can discuss more over there. I wrote it out so each commit message is roughly a PR comment + suggestion. Some of the changes are related to finalizing the "SEP YAML" workflow, so just truing up the YAML here with that work. |
|
Thanks for the thorough review @pcarleton! @mikekistler is out for the week, so @tarekgh is taking this over. I was about to take a look at this myself, but at a glance all these changes look good to me. I know @tarekgh had some of the same questions @maciej-kisiel had around the strictness of number parameter comparisons. Should there be an epsilon or do we really need an exact string match between the JSON body and the headers?
modelcontextprotocol/modelcontextprotocol#2243 (comment) I also wonder if we should add a test that's basically the inverse of what was removed in de2fcb1. If we want the implementations to be strict, now is the time to have the conformance tests enforce it. I definitely don't think we need to block on this though. I'm personally supportive of merging mikekistler#1 into this PR, but I'll let @tarekgh decide on that. |
|
Reviewed the changes against the SEP-2243 spec and the merged transports spec. Everything looks good. nice cleanup overall. One non-blocking item worth discussing though: Base64 strictness (INFO vs FAILURE): The two malformed-base64 checks (sep-2243-server-rejects-invalid-base64-padding and sep-2243-server-rejects-invalid-base64-chars) are downgraded from FAILURE to INFO. The commit rationale is that the normative spec text only says:
…without mandating strict RFC 4648 validation. So a lenient decoder that accepts unpadded SGVsbG8 and successfully matches the body is spec-compliant. I think that reasoning is sound. However, the SEP-2243 Conformance Test Cases table says:
So, we have a gap: the normative text is ambiguous on strictness, but the conformance table explicitly says MUST reject. If we agree INFO is the right severity here, should we also update the SEP conformance table to reflect that (e.g. "behavior is implementation-defined" or I'll go ahead and merge your diff PR here and we can move forward with this PR. |
…ethods; make getChecks() idempotent A client that never calls prompts/list isn't violating SEP-2243 — the spec sentence is 'The client MUST include the standard MCP request headers on each POST request', and a request that was never sent can't fail it. The Mcp-Method requirement is already proven by initialize/tools/list etc., so there's nothing unique to prompts/list. SKIPPED keeps the gap visible without a false red. Separately: getChecks() was pushing into this.checks on every call. The runner may call it more than once (e.g. progress + final report); that produced duplicate rows. Build a fresh array per call instead.
checkMcpMethodHeader already has 'if (this.methodHeaderChecks.has(method)) return'. checkMcpNameHeader was missing the equivalent. The test server advertises two tools and two resources; a client that calls both produces two 'client-mcp-name-header-tools-call' / '...-resources-read' rows.
String.replace with a string arg only replaces the first occurrence. 'notifications/initialized' is fine (one slash) but a method like 'notifications/resources/updated' would yield 'notifications-resources/updated' in the check id. (replaceAll isn't in this project's TS lib target.)
… round-trips The spec doesn't forbid base64-encoding values that don't strictly need it. A client that always wraps would send '=?base64??=' for empty_val: ''. The '.+' wouldn't match, so the harness would compare '=?base64??=' === '' literally and FAIL a compliant client. '(.*)' lets the empty payload decode to '' and match.
This check only pushed on FAILURE; a correct client produced no row at all, so the pass was invisible in reports and couldn't be counted toward coverage. Same-slug SUCCESS/FAILURE pair is the repo convention.
checkParamHeader('Verbose', ...) already pushes a FAILURE when the header is
missing. The explicit 'optional-present' block re-tests the same condition
under a second check id, doubling the row.
…n, not strings defaultArgs feeds the JSON-RPC body. With Record<string, string> + '0'/'false', a server that validates inputSchema rejects on type mismatch — which is also a 400. Every header-rejection check then false-PASSes (server rejected for the wrong reason), and server-accepts-valid-base64 false-FAILs a compliant server because it never gets past schema validation. Also handle 'integer' (JSON Schema distinguishes it from 'number').
…no JSON-RPC error in body
A server can return HTTP 200 with {error: {code: -32001, ...}} in the body.
Status-only acceptance lets that through as a pass.
The literal will rot when the draft cycle rolls over. types.ts already exports DRAFT_PROTOCOL_VERSION for this.
Regenerated from the SEP-2243 spec diff (transports.mdx + tools.mdx) rather than from the scenario implementations: 21 check rows + 4 excluded vs the previous 46 + 2. Differences: - one check id per spec sentence (test variants go in 'details', not new ids) - 'text:' quotes the spec sentence verbatim instead of paraphrasing - '-reject-status' (MUST 400) split from '-reject-error-code' (SHOULD -32001 for standard headers, MUST for custom) - rows with no spec backing dropped (whitespace, base64-padding/chars, prefix/ suffix-literal, control-char-name) - two more SHOULD excludes (log-warning, no-sensitive-params) - check ids use sep-2243-<slug> convention - check: key first, excludes grouped at bottom (matches sep-2164) Moved to src/seps/ to match modelcontextprotocol#272 layout (this branch predates it; will reconcile cleanly on rebase).
Self-contained vitest that POSTs initialize with and without Mcp-Method, asserts FAILURE/SUCCESS on the pinned check id, and proves getChecks() is idempotent. No example client/server file needed for this one because the scenario *is* the server — the test crafts the bad request directly. This is the negative-test half of AGENTS.md §Testing your scenario; the everything-client passing-example half is still TODO.
…ed-base64 tests These five tests assert behavior the SEP-2243 text doesn't pin down. Kept because they're useful consistency signals across SDKs, but adjusted so they don't FAIL spec-compliant servers: - server-accepts-whitespace-header-value: this IS a MUST, but per RFC 9110 §5.5 (field parsing MUST exclude OWS), not SEP-2243. Kept as FAILURE, specReference now cites RFC 9110. - server-rejects-invalid-base64-padding / -chars: downgraded to INFO. SEP-2243 says only 'MUST decode them accordingly' without picking RFC 4648 strict vs lenient. Node's Buffer.from() accepts unpadded/dirty input and decodes to a matching value (server accepts); .NET's Convert.FromBase64String throws (server rejects). Either is currently compliant. WARNING would force Tier-1 SDKs into non-default strict decoding (modelcontextprotocol#245); INFO records the behavior without prescribing it, so cross-SDK divergence is visible without tier impact. - server-literal-missing-base64-prefix / -suffix: unchanged. The wrapper syntax is '=?base64?{x}?=' complete; treating partial wrappers as literal is the natural reading. Plumbing: createRejectionCheck gains an optional failSeverity param; testBase64Case gains a 'reject-info' mode. (This is also the hook for the larger 400-MUST / -32001-SHOULD split that still needs doing.)
…) + error-code (SHOULD/MUST) SEP-2243 §Server Validation: 'servers MUST return HTTP status 400 ... and SHOULD include a JSON-RPC error response using ... -32001'. So a server that returns 400 with -32600 (or no error body) is compliant for standard headers. The single conflated check FAILed it. For custom headers (§Server Behavior for Custom Headers) -32001 IS a MUST, so both halves stay FAILURE there. createRejectionCheck → createRejectionChecks returning two rows: '<id>' for the 400 status and '<id>-error-code' for -32001. testCase (standard) sets errorCodeSeverity=WARNING; testBase64Case/testMissingCustomHeader (custom) keep FAILURE; the two malformed-base64 INFO probes set both halves to INFO.
Aligns with the sep-2243.yaml traceability ids and the repo convention (sep-2164-*, sep-2207-*). Variant suffixes are kept (per-method, per-tool) so per-case visibility isn't lost; the YAML's row id is a prefix of the emitted ids, which is fine — extra ids not in the YAML are allowed. Also fixes the mixed hyphen/underscore in reject-invalid-tool ids (toolName has underscores; replace to hyphens).
…scenarios extend it http-standard-headers.ts had its own copy of the start/stop/handleRequest boilerplate (~100 lines) that http-custom-headers.ts already abstracted as BaseHttpScenario. Moved the abstract class to a shared file; both scenario files now import it. HttpStandardHeadersScenario implements handlePost() instead of handleRequest(); its handleInitialize() uses the base sendInitialize() with the resources/prompts capability flags it needs. Net: 465 → 361 lines for http-standard-headers.ts; the third inline copy of the same HTTP-server scaffold is gone.
Per-chunk .toString() corrupts a multi-byte UTF-8 char that straddles a TCP
chunk boundary (e.g. 0xC3 | 0xBC decodes to two replacement chars instead of
'ü'). The custom-headers scenario sends '日本語'/'naïve' in the body, so this
is reachable in principle. setEncoding('utf8') makes 'data' emit strings with
boundary handling done by Node's StringDecoder.
Fixed in both places: BaseHttpScenario.handleRequest (request body) and
sendRawRequest (response body).
…edIn/removedIn) The introducedIn/removedIn refactor (modelcontextprotocol#265) replaced the specVersions array with a source property on all scenario interfaces. The SEP-2243 scenarios still used the old specVersions field, causing matchesSpecVersion() to receive undefined and crash in spec-version.test.ts. Replace specVersions with source = { introducedIn: DRAFT_PROTOCOL_VERSION } in BaseHttpScenario and the two server ClientScenario classes. Remove the now-unused SpecVersion imports.
|
Cool let's merge this, and then address the 2 open questions (base64 strictness, float comparison) in a follow up. (Edit: actually just float handling in a follow-up, I'll flip base64 back to failure here.) I see what you're saying on the base64 table now, sorry I was focused on the SHOULD / MUST's. There's 2 related questions on this: (a) How strict do we want to be? and (b) Do we want to move the strictness declaration into the spec language itself? We've approved the table, so that is the current source of truth, so I think you're right it makes sense to make those into Failures, and then we can ask SDK's to complain loudly if the strictness is annoying in languages. Mostly I don't want SDK implementers to have to roll their own base64, which hopefully they won't have to. I'll flip those back to failure and then merge. On spec language, I'm a bit conflicted because we've split the source of truth. On the one hand, we don't expect folks to read the SEP's, just the spec itself which makes me inclined to say move it into the spec. On the other hand, it's a bit in the weeds, and we do have the conformance test, so probably fine to just leave it where it is. If they ever conflicted, we'd go with the spec language over the SEP, but this is just more specific. Again, I'm inclined to leave it how it is. On float comparison, I haven't formed a strong opinion on it, happy to follow what you all think is best there. It might be a good place for a language table to see where things mismatch. I agree whatever we decide should make it into an edit on the SEP / spec language (I don't think it needs a separate SEP since we're still in draft, that's why we're doing the RC so we can make edits like these.) |
…FAILURE Per discussion on #259: the SEP-2243 conformance-test-case table is the approved source of truth and lists these as reject cases, so they're FAILURE-level even though the spec body itself only says 'MUST decode them accordingly'. SDKs whose stdlib base64 is lenient (Node, browsers) will need to validate before decoding; if that's burdensome we'll revisit. Removes the now-unused 'reject-info' mode and statusSeverity/INFO plumbing introduced for these two probes.
…FAILURE Per discussion on modelcontextprotocol#259: the SEP-2243 conformance-test-case table is the approved source of truth and lists these as reject cases, so they're FAILURE-level even though the spec body itself only says 'MUST decode them accordingly'. SDKs whose stdlib base64 is lenient (Node, browsers) will need to validate before decoding; if that's burdensome we'll revisit. Removes the now-unused 'reject-info' mode and statusSeverity/INFO plumbing introduced for these two probes.
Summary
Adds conformance test scenarios for SEP-2243 (HTTP Standardization), covering the new standard and custom MCP request headers.
The PR also adds a "prepare" script to the package.json to allow "npm install" from a branch for pre-merge testing.
Scenarios added
Server scenarios (test server — inspects client requests)
http-standard-headers— Verifies clients sendMcp-MethodandMcp-Nameheaders on HTTP POST requestshttp-custom-headers— Verifies clients handlex-mcp-headerannotations: mirror param values intoMcp-Param-*headers, apply Base64 encoding for non-ASCII values, and convert number/boolean values correctlyhttp-invalid-tool-headers— Verifies clients reject invalidx-mcp-headertool definitions (empty values, duplicates, non-primitive types, illegal characters, etc.)Client scenarios (connect to server under test)
http-header-validation— Verifies servers reject mismatched/missingMcp-Method/Mcp-Nameheaders, accept case-insensitive header names, reject case-mismatched values, trim whitespace, validate Base64-encoded custom headers, and return-32001(HeaderMismatch)http-custom-header-server-validation— Verifies servers validate customMcp-Param-*headers against the request bodyTraceability
src/scenarios/sep-2243.yamlmaps 30+ spec requirements to check IDsNotes
preparescript inpackage.jsonfor git-based dependency installsChecklist
npm run buildpassesnpm testpassessrc/scenarios/index.ts