Multi-Model Repo Review Workflow and Findings (2026-03-07)
Overview
This is a factual record of a structured multi-model review run completed on 2026-03-07 for the hensler-photography project and live site surfaces.
Method used: independent model critiques (DeepSeek, Mistral, Qwen), followed by Sonnet 4.6 arbitration and final synthesis.
Prompt Pattern Used
1) Independent review prompt
You are performing a critical architecture + product review for a solo owner.
Evaluate repository and live-site snapshot.
Output:
1) Executive verdict
2) Top strengths
3) Top risks/gaps
4) Highest-ROI actions (Impact/Effort/Risk)
5) Now/Next/Later
6) Brand/UX guardrails
7) What NOT to do right now
2) Cross-model debate prompt
Facilitate a structured debate among DeepSeek, Mistral, and Qwen outputs.
Return:
1) Consensus findings
2) Key disagreements
3) Arbitration decisions
4) Final merged priority list
3) Final synthesis prompt
Create a concise decision brief for a solo owner:
- Executive summary
- Now / Next / Later
- Risks
- Brand guardrails
- 7-day execution checklist
Pseudocode Workflow
collect_context(repo_head, commits, structure, README, live_snapshot)
for model in [deepseek, mistral, qwen]:
run_independent_review(model, fixed_schema_prompt)
run_cross_model_debate(sonnet_4_6, all_reviews)
run_final_synthesis(sonnet_4_6, debate + all_reviews)
save_all_artifacts()
Key Findings Summary
- Consensus: stack is appropriate for current scale; WebP/image pipeline is a real strength; operational documentation is unusually strong.
- Main risk cluster: operational/security hardening and business reliability are higher ROI than additional feature work.
- Immediate priorities: backup restore drill, admin hardening (rate limits + 2FA), and uptime monitoring.
- Next priorities: SEO/social metadata, replace placeholder landing page, and close Liam/Adrian parity gap.
- Deferred: object storage migration and CI/CD automation unless trigger thresholds are hit.
Notable Model Divergence
- Admin hardening: mechanism emphasis differed (infra controls vs app-level 2FA); synthesis recommends both.
- Object storage timing: one model favored earlier migration; synthesis deferred until scale/performance triggers.
- Compliance: GDPR/CCPA consent risk was explicitly flagged by one model and accepted in arbitration.
- Filter UX: one model praised current improvements; two warned about over-iteration. Synthesis: freeze further filter changes until usage data supports changes.
Artifacts
Cost Note
This run did not persist a token/cost ledger in the saved artifacts. Future runs can include a per-call usage and cost report for full cost auditability.