CAP-1073: lock ordering on unique_key to prevent worker/push deadlocks#2
Merged
Merged
Conversation
Enforce a deterministic lock ordering on chancy_jobs rows so that concurrent push_many_ex INSERT ... ON CONFLICT transactions cannot deadlock with the worker batch update. Sorting by (unique_key, id) guarantees both sides acquire row-level exclusive locks in the same order, eliminating the circular wait detected in CAP-1073.
Jobs sharing a unique_key collide via ON CONFLICT DO UPDATE with rows being updated by the worker's _maintain_updates batch. Without a deterministic lock order, the two transactions can acquire row locks in opposite orders and deadlock (CAP-1073). Sorting the inserts by unique_key matches the ordering now applied worker-side, removing the cycle. References are placed back in the caller's original order so the public return contract is unchanged.
ppsmilesi
reviewed
Apr 21, 2026
|
|
||
| @staticmethod | ||
| def _job_unique_key(job: Job | IsAJob[..., Any]) -> str: | ||
| if callable(job): |
There was a problem hiding this comment.
c'est comme ça qu'il fait ailleurs ou il utilise isinstance(...
Author
There was a problem hiding this comment.
Aligné sur isinstance(job, Job) (cohérent avec le call site existant ligne 694) dans af32351.
| ) | ||
| record = await cursor.fetchone() | ||
| references.append(Reference(record["id"])) | ||
| references[original_index] = Reference(record["id"]) |
There was a problem hiding this comment.
mypy doit râler que references: list[Reference | None] mais que le retour de la fonction soit list[Reference]. Faut surement caster ou utiliser un dict intermédiaire.
Author
There was a problem hiding this comment.
Bien vu. Remplacé par un dict[int, Reference] intermédiaire dans af32351, le retour est maintenant un list[Reference] propre, sans cast.
- Use isinstance(job, Job) in _job_unique_key for consistency with the pattern already used at the call site in push_many_ex. - Replace pre-allocated list[Reference | None] with a dict[int, Reference] intermediate so the function returns a clean list[Reference] without needing a cast or carrying Optionals through to mypy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contexte
Complète le fix de CAP-1073. Le retry sur
DeadlockDetected(mergé dans #1) évite le crash du worker mais ne s'attaque pas à la cause racine.Investigation complète :
~/claude-vault/projects/api/briefs/CAP-1073-investigation-deadlock-chancy.md.Cause racine
Deux transactions concurrentes sur
chancy_jobsacquièrent des row-level locks dans un ordre non déterministe :executemany UPDATE ... WHERE id=?dans_maintain_updates(ordre = arrivée queue interne)INSERT ... ON CONFLICT (unique_key) DO UPDATEdanspush_many_ex(ordre = construction de la liste)Quand les deux touchent les mêmes lignes (jobs avec
unique_key, typiquement Aircall contact sync), PostgreSQL détecte un cycle → deadlock.Fix
Tri par
unique_keydes deux côtés pour garantir un ordre de lock identique → plus de cycle possible.Changements
chancy/worker.py::_maintain_updates—pending_updates.sort(key=lambda u: (u.unique_key or "", u.id))avantexecutemany. Tiebreakeridpour déterminisme.chancy/app.py::push_many_ex+sync_push_many_ex— tri des jobs parunique_keyavec index conservé, puis reconstruction de la liste deReferencedans l'ordre d'entrée original → contrat public préservé._job_unique_keypartagé entre async et sync (gèreJob | IsAJob).Autres endroits audités
Tous les autres UPDATE/INSERT sur
chancy_jobssont safe (pas de patch nécessaire) :cancel_jobfetch_jobs(worker)FOR UPDATE SKIP LOCKED+ORDER BYreprioritizepluginFOR UPDATE SKIP LOCKED+ORDER BY idmetricspluginmetrics, pasjobscronplugincron, pasjobsCoût / risques
Test plan
DeadlockDetectedne remonte sur Sentry pendant 48h[api-chancy-worker] worker is down— le flap doit cesserTkTech/chancy