ULTIMATE PIPELINE ARCHITECTURE v3.0

════════════════════════════════════════════════════════════════════

世界最強パイプライン設計書

NASA × Amazon × Apple × Google × Anthropic の知見を完全統合

98〜100点クオリティを保証するメタプロンプト & フォーマット

════════════════════════════════════════════════════════════════════

version: 3.0

created: 2026-03-18

research_basis:

- NASA/JPL Power of 10 Rules (Gerard Holzmann, 2006)

- Amazon Two-Pizza Team + Working Backwards (Jeff Bezos, 2004-)

- Google SRE Release Engineering (2016-)

- Google TFX + CI/CD/CT Three-layer (2019-)

- Anthropic Multi-agent System (90.2% improvement, 2025)

- DORA Elite Performers 2025 (182x deploy frequency)

- SLSA L3 Enterprise Standard (2025)

- DSPy Stanford HAI (25%+ improvement, 2024)

- Constitutional AI (Anthropic, 2022-)

- MCP + OpenTelemetry Integration (2025)

════════════════════════════════════════════════════════════════════

PART 1: 世界最高パイプラインの10原則

════════════════════════════════════════

なぜこの10原則が「98〜100点」を保証するのか

NASA・Amazon・Apple・Google・Microsoft・Meta —— 世界を動かす大企業のパイプライン設計を横断調査した結果、 10の普遍的原則が帰納的に導出された。

これらは「特定ツールの選択」ではなく「パイプライン設計の哲学」であり、どんなシステムにも適用できる不変の法則群である。

原則 1: Shift Left — 品質保証の前倒し

根拠: NASA/JPL は「コンパイル前の静的解析」を義務化。 Google は「データ検証をMLパイプライン入口に配置」。

❌ WRONG: 開発 → テスト → セキュリティ → デプロイ
✅ RIGHT: セキュリティスキャン → 開発 → 即時テスト → デプロイ

実装: PRオープン時に自動で SAST/SCA/Lint を実行。問題はコミット単位で検出し、修正コストを最小化する。

原則 2: Fail Fast — 高速失敗による反復速度向上

根拠: SpaceX は「1日17,000回のデプロイ」を実現。これは「即座の失敗検知」があるからこそ可能。

DORA 2025 Elite Performers 基準値: | 指標 | エリート水準 | 一般平均 | |------|------------|---------| | デプロイ頻度 | 複数回/日 | 週1〜月1回 | | 変更リードタイム | 1時間未満 | 1週間〜1ヶ月 | | 変更失敗率 | 5%未満 | 15〜30% | | MTTR | 1時間未満 | 1日〜1週間 |

エリート組織は一般組織の 182倍のデプロイ頻度、127倍のリードタイム速度。

原則 3: Immutable Artifacts — 不変アーティファクト

根拠: Google Cloud Build は $COMMIT_SHA によるタグ付けを標準化。同一アーティファクトを dev → staging → prod にプロモート。

# CORRECT: コミットSHAで不変タグ
docker build -t myapp:${COMMIT_SHA} .
# 全環境で同じイメージを使用（環境依存なし）

原則 4: Hermetic Build — 自己完結ビルド

根拠: Google SRE リリースエンジニアリングの核心原則。「ビルド環境外のサービスに依存しない、完全に再現可能なビルド」。

依存関係を完全ロック（lockfileを必ずコミット）
コンテナ化による環境分離
外部CDN/npmレジストリへの依存を最小化

原則 5: Pipeline as Code — コードとしてのパイプライン

根拠: TFX/Kubeflow/GitHub Actions/SageMaker —— 全て「パイプライン定義をコードで管理」という共通パターン。

パイプライン定義はアプリコードと同じリポジトリでバージョン管理・コードレビューの対象とする。

原則 6: Observability First — 観測可能性の最優先

根拠: Google SRE「監視できないサービスは信頼性を持てない」。 2025年にMCPとOpenTelemetryが統合され、 LLMツール呼び出し全てにトレースIDが付与可能に。

3つの観測レイヤー（Metrics + Logs + Traces）を全レイヤーに実装。

原則 7: Ownership = Accountability — 所有と責任の一致

根拠: Amazon の「You Build It, You Run It」哲学。 Two-Pizza Team（10人以下）のエンゲージメント率は 42%（大組織平均30%超）。

「自分が作ったものが壊れたら自分が直す」という所有感が自律的な品質向上インセンティブを生む。

原則 8: DAG-based Dependency Management

根拠: TFX・Kubeflow・Apache Airflow・Step Functions —— 全てがDAGを基本設計単位。

DAGの3つの利点: - 自動並列化: 依存関係のないノードを同時実行 - 最小再実行: 失敗したノードのみ再実行 - 視覚的理解: 依存関係グラフで全体把握

原則 9: Progressive Delivery with Quality Gates

根拠: Google CI/CD「統合テスト合格が次段階へのゲート」。 SageMaker「モデル精度が既存を上回る場合のみ自動デプロイ」。

dev → staging → production への自動プロモート
= 明示的品質ゲート（テスト合格/精度閾値/セキュリティスキャン）
  を通過した場合のみ実施

原則 10: Self-Service + Standardization の共存

根拠: GitHub「内部マーケットプレイスでCI/CD設定時間40%削減」。 Google SRE「製品チームが自律制御できるよう、ベストプラクティスとツールを提供」。

チームの自律性 ✖️ 組織全体の標準化 = 最高のパイプライン品質

PART 2: 98〜100点パイプラインの定量評価フレームワーク

════════════════════════════════════════════════════════

スコアリングモデル（5次元100点満点）

総合スコア = Σ(次元スコア × 重み)

次元1: スループット品質     (25点)
  = Deployment Frequency + Lead Time for Changes

次元2: 安定性              (20点)
  = Change Failure Rate + MTTR + Rework Rate

次元3: セキュリティ保証    (20点)
  = SLSA Level(L1=5, L2=10, L3=20) + OpenSSF Score(0-10)

次元4: 観測可能性          (20点)
  = OTel Coverage(%) + Tracing + Alerting SLO

次元5: 評価ループ品質      (15点)
  = EDD実装 + Hallucination Rate + Retrieval Accuracy

SLSA（サプライチェーン保証）レベル対応表

SLSAレベル	要件	実装	スコア
L1	ビルドプロセス文書化	GitHub Actions ログ	5点
L2	署名済みProvenance	Sigstore + GHA	10点
L3	改ざん防止ビルド環境	Hermetic Build	20点
L4	2名以上レビュー	組織プロセス	最大

98〜100点の条件: SLSA L3 + OpenSSF Score 8以上

PART 3: 汎用パイプライン定義フォーマット（YAML Schema）

════════════════════════════════════════════════════════

# meta-pipeline.schema.yaml v3.0
# CWL + Nextflow DSL-2 + GitHub Actions + DSPy を統合した汎用フォーマット
# どんなシステム開発にも適用可能

apiVersion: meta-pipeline/v3
kind: Pipeline

metadata:
  name: "パイプライン識別子"
  version: "1.0.0"           # semver 必須
  description: "解決する問題の説明"
  created: "2026-03-18"
  tags: [research, generation, qa, deploy]

# ─────────────────────────────────────────────
# I/O シグネチャ（DSPy スタイル: 何を入れて何を出すか宣言）
# ─────────────────────────────────────────────
signature:
  inputs:
    - name: topic
      type: string
      required: true
      description: "処理対象のトピック・タスク"
    - name: quality_target
      type: enum
      values: [standard, high, enterprise]
      default: high
  outputs:
    - name: result
      type: markdown
      description: "最終成果物"
    - name: confidence_score
      type: float
      range: [0.0, 1.0]
    - name: sources
      type: list[url]

# ─────────────────────────────────────────────
# 品質ゲート設定（Progressive Delivery）
# ─────────────────────────────────────────────
quality_gates:
  standard:
    accuracy: 0.80
    completeness: 0.85
    hallucination_rate: 0.10
  high:
    accuracy: 0.90
    completeness: 0.93
    hallucination_rate: 0.05
  enterprise:
    accuracy: 0.95
    completeness: 0.97
    hallucination_rate: 0.02
    human_review: true

# ─────────────────────────────────────────────
# ステップ定義（DAG: 依存関係で自動並列化）
# ─────────────────────────────────────────────
steps:
  - id: shift_left_check
    name: "事前チェック（Shift Left原則）"
    type: validation
    parallel: true
    tasks:
      - "入力検証（スキーマ・長さ・禁止コンテンツ）"
      - "APIキー有効確認"
      - "コスト上限チェック"
    on_failure: abort

  - id: intelligence_gather
    name: "情報収集（並列3本以内）"
    type: parallel_search
    depends_on: [shift_left_check]
    max_parallel: 3             # レート制限・カスケード失敗防止
    inputs:
      query: "{{inputs.topic}}"
    constitution:
      - "複数ソース（3以上）から収集すること"
      - "1次ソース（公式ドキュメント・論文・発表者ブログ）を優先"
      - "2026年以降の情報を最優先"
    on_failure: "skip_and_mark_partial"

  - id: analyze
    name: "分析・統合（CoT + ReAct）"
    type: llm_task
    depends_on: [intelligence_gather]
    model: claude-sonnet-4-6     # 収集フェーズは Sonnet でコスト最適化
    chain_of_thought: true
    react_enabled: true
    prompt_template: |
      あなたは{{role}}の専門家です。

      ## 分析対象
      {{inputs.topic}}

      ## 収集データ
      {{steps.intelligence_gather.outputs.results}}

      ## 要求
      以下の形式で分析してください:
      1. 主要発見事項（箇条書き・根拠URL付き）
      2. エビデンス強度評価（Strong/Moderate/Weak）
      3. 不確実性の明示
      4. 推奨アクション
    outputs:
      analysis: string
      confidence: float

  - id: constitutional_review
    name: "Constitutional AI 品質審査"
    type: self_review
    depends_on: [analyze]
    constitution:
      - "事実と推論を明確に区別しているか"
      - "数値・コスト・統計には出典URLが付いているか"
      - "ハルシネーションの可能性がある箇所を明示しているか"
      - "反証可能な形で主張が記述されているか"
      - "ユーザーに実害を与えるコンテンツが含まれていないか"
      - "AIが生成した「〜と考えられます」だけの抽象論で終わっていないか"
    outputs:
      violations: list[string]
      should_revise: boolean
      critique: string

  - id: revise
    name: "修正（条件付き）"
    type: conditional_llm_task
    condition: "{{steps.constitutional_review.outputs.should_revise}} == true"
    depends_on: [constitutional_review]
    model: claude-opus-4-6       # 修正は高品質 Opus で
    inputs:
      original: "{{steps.analyze.outputs.analysis}}"
      critique: "{{steps.constitutional_review.outputs.critique}}"
    max_revision_rounds: 3

  - id: multi_reviewer_qa
    name: "マルチレビュアー品質保証（並列）"
    type: parallel_review
    depends_on: [constitutional_review, revise]
    reviewers:
      - role: "事実確認専門家（Fact Validator）"
        focus: ["数値", "固有名詞", "日付", "引用の正確性", "Dead Link確認"]
        model: claude-haiku-4-5   # 軽量・高速
      - role: "論理構造アナリスト（Logic Analyzer）"
        focus: ["前提と結論の整合性", "反証例の存在", "飛躍した推論"]
        model: claude-sonnet-4-6
      - role: "実用性審査（Practicality Judge）"
        focus: ["明日から実装できるか", "コスト試算の正確性", "代替案の有無"]
        model: claude-sonnet-4-6
    approval_policy: "all_pass"   # 全員合格で次へ
    conflict_resolution: "escalate_to_human"

  - id: synthesis
    name: "最終統合レポート生成"
    type: llm_task
    depends_on: [multi_reviewer_qa]
    model: claude-opus-4-6       # 最終成果物は最高品質 Opus
    outputs:
      final_report: markdown
      confidence_score: float
      sources: list[url]

# ─────────────────────────────────────────────
# Human-in-the-Loop（確信度スコアで選択的発動）
# ─────────────────────────────────────────────
human_review:
  trigger_condition: "confidence_score < 0.75"
  review_interface: "slack_approval"
  timeout: "24h"
  fallback_on_timeout: "publish_with_warning"

# ─────────────────────────────────────────────
# 観測可能性（OpenTelemetry 標準）
# ─────────────────────────────────────────────
observability:
  tracing: opentelemetry
  metrics:
    - latency_ms
    - token_count_input
    - token_count_output
    - cost_usd
    - quality_score
    - step_success_rate
  alerts:
    - metric: quality_score
      threshold: "< 0.70"
      action: notify_slack
    - metric: cost_usd
      threshold: "> 5.00"
      action: notify_slack_and_pause

# ─────────────────────────────────────────────
# エラー処理（No Silent Error Swallowing）
# ─────────────────────────────────────────────
error_handling:
  on_step_failure:
    intelligence_gather:
      action: "skip_and_continue"
      log_level: "warn"
    analyze:
      action: "retry"
      max_retries: 3
      backoff: "exponential"
      on_max_retries: "escalate_to_human"

# ─────────────────────────────────────────────
# 環境別設定（ArgoCD ApplicationSet 思想）
# ─────────────────────────────────────────────
environments:
  development:
    quality_gates.target: standard
    multi_reviewer_qa.approval_policy: "any_pass"
    human_review.trigger_condition: "confidence_score < 0.50"
  staging:
    quality_gates.target: high
  production:
    quality_gates.target: enterprise
    human_review.trigger_condition: "confidence_score < 0.85"

PART 4: 最強メタプロンプトテンプレート

════════════════════════════════════════════

使い方

このテンプレートに [BUILD_TARGET] を記入してClaudeに渡すと、世界最高レベルのパイプラインを自動構築できる。

あなたは ULTIMATE PIPELINE ARCHITECT v3.0 として動作してください。
以下の10原則と5次元スコアリングを全STEPで遵守し、
98〜100点クオリティのパイプラインを構築してください。

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【構築対象（BUILD_TARGET）】
[BUILD_TARGET]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【必須遵守: 10の設計原則】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. SHIFT LEFT: セキュリティスキャン・型チェック・Lintをパイプライン最初に配置
2. FAIL FAST: 最速テストを最初に実行。失敗なら即座に停止
3. IMMUTABLE ARTIFACTS: $COMMIT_SHAでタグ付け。全環境で同一アーティファクト
4. HERMETIC BUILD: ビルドは外部サービスに依存しない自己完結構造
5. PIPELINE AS CODE: パイプライン定義をアプリコードと同一リポジトリで管理
6. OBSERVABILITY FIRST: Metrics + Logs + Traces の3層を全コンポーネントに実装
7. OWNERSHIP = ACCOUNTABILITY: 各コンポーネントのオーナーを明示。You Build It, You Run It
8. DAG-BASED DEPENDENCIES: 依存関係を明示してコンポーネントを自動並列化
9. PROGRESSIVE DELIVERY: 品質ゲートを通過した場合のみ次環境にプロモート
10. SELF-SERVICE + STANDARDIZATION: チームの自律性と組織標準化を両立

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【品質基準（98〜100点の定義）】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

DORA Elite Performers 水準を全指標で達成すること:
- デプロイ頻度: オンデマンド（1日複数回）
- 変更リードタイム: 1時間未満
- 変更失敗率: 5%未満
- MTTR: 1時間未満

セキュリティ保証:
- SLSA Level 3: 改ざん防止ビルド（Sigstore + Hermetic Build）
- OpenSSF Scorecard: 8点以上
- CVE脆弱性: Critical ゼロ、High 48時間以内修正

AI/LLM特有品質（Evaluation-Driven Development）:
- Faithfulness（回答がコンテキストに忠実）: 0.90以上
- Answer Relevancy（質問への関連性）: 0.87以上
- Hallucination Rate: 5%以下
- P95 Latency: 2000ms以下

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【実行フロー（省略禁止）】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

STEP 0: PRE-FLIGHT
  □ 必要なAPIキー・環境変数の確認
  □ コスト上限の設定（デフォルト $5/実行）
  □ SLSA Level の目標設定
  → 確認後に開始メッセージを表示

STEP 1: キーワード宇宙の展開 [Claude Sonnet 4.6]
  □ /keyword-mega-extractor 実行
  □ /intelligence-research 並行起動（バックグラウンド）
  □ コア/関連/複合/急上昇/ニッチ/技術スタック候補を抽出
  → キーワードCSVを保存してSTEP 2へ

STEP 2: ディープリサーチ × 2回 [Claude Sonnet 4.6]
  Pass 1: 3エージェント並列（MCP・API・アーキテクチャ）
    - 最大並列数: 3（カスケード失敗防止）
    - 各エージェントの返答: 500文字以内に要約してメインコンテキスト保護
  Pass 2: ギャップ補完（omega-research or mega-research-plus）
    - Pass 1で「不明」「要確認」とした全項目を解消
    - セキュリティリスクの代替案
    - 日本語コミュニティの反応（Zenn/Qiita/はてブ）

STEP 3: 設計・評価 [Claude Opus 4.6]
  □ TrendScore計算（全ツール: hot★★★/warm★★/cold★）
  □ Constitutional Review（6原則で自己審査）
  □ CVEチェック（osv.dev + socket.dev + nvd.nist.gov）
  □ システムアーキテクチャ設計（Mermaid C4図）
  □ 3フェーズ実装計画（MVP/自動化/スケール）

STEP 4: 12セクション完全レポート生成 [Claude Opus 4.6]
  必須セクション:
  1. Executive Summary（価値・差別化・コスト・ROI）
  2. 市場地図（MCP/API/OSS vs SaaS 全体マップ）
  3. X/SNSリアルタイムトレンド分析
  4. Keyword Universe（キーワード宇宙全体）
  5. データ取得戦略（全ソース・コスト・制限）
  6. 正規化データモデル（TypeScript interface + DB設計）
  7. TrendScore算出結果（hot/warm/cold 色分け）
  8. システムアーキテクチャ図（Mermaid C4）
  9. 実装計画（3フェーズ Ganttチャート）
  10. セキュリティ/法務/運用設計（CVE・ライセンス・PII）
  11. リスクと代替案（確率×影響×代替手段）
  12. Go/No-Go 意思決定ポイント

STEP 4.5: QA Gate 3レビュアー [ChatGPT 5.4 thinking / OpenRouter]
  Reviewer 1 - 網羅性チェック（Coverage Auditor）: 70点以上
  Reviewer 2 - 信頼性チェック（Fact Validator）: 70点以上
  Reviewer 3 - 実用性チェック（Practicality Judge）: 70点以上

  判定:
    ✅ PASS（全員70点以上）  → ユーザー提出へ
    ⚠️ CONDITIONAL（1名不合格）→ 自動補完 → 再判定
    ❌ FAIL（2名以上不合格）→ STEP 2 Pass 2 を再実行

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【品質チェックリスト（全STEPで守ること）】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

□ 最低情報源数: 各発見につき3ソース以上の裏付け
□ 引用: 数値・コストには出典URL を付記（必須）
□ コードサンプル: 主要コンポーネントに実装例を含める
□ 抽象論禁止: 「〜が重要です」だけで終わらず、具体的実装方法まで落とす
□ 不確実性の明示: 「確認できなかった」事項を必ず明記
□ コスト試算: 全フェーズに月額・API料金・インフラ費を記載
□ セキュリティ: CVE/脆弱性リスクを全候補ツールで確認
□ No Silent Error: キャッチブロックには必ずログを記述
□ サブエージェント結果: 500文字以内に要約してメインコンテキスト保護
□ フェーズ境界で /compact 実行（コンテキスト保護）

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[BUILD_TARGET] を確認して今すぐ STEP 0 から実行してください。
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

PART 5: Claude Code 専用 CI/CD パイプライン実装

════════════════════════════════════════════════════

GitHub Actions ゴールデンパイプライン

# .github/workflows/ultimate-pipeline.yml
# NASA × Amazon × Google × Anthropic の知見を統合した最強CI/CD

name: Ultimate Pipeline v3.0

on:
  pull_request:
    types: [opened, synchronize, ready_for_review]
  push:
    branches: [main, release/*]

jobs:
  # ──────────────────────────────────────
  # STAGE 1: SHIFT LEFT（最速チェック最初に）
  # ──────────────────────────────────────
  shift-left:
    name: "Shift Left Checks"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lint & Type Check
        run: npm run lint && npm run typecheck
      - name: Security Scan (SAST)
        run: npx semgrep --config auto --error
      - name: Dependency Audit (SCA)
        run: npm audit --audit-level=high
      - name: Generate SBOM
        run: npx @cyclonedx/cyclonedx-npm --output sbom.json

  # ──────────────────────────────────────
  # STAGE 2: 並列品質チェック（Shift Left通過後）
  # ──────────────────────────────────────
  unit-tests:
    needs: shift-left
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test -- --coverage --coverageThreshold='{"global":{"lines":80}}'

  claude-review:
    needs: shift-left
    runs-on: ubuntu-latest
    steps:
      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          prompt: |
            Review this PR applying NASA Power of 10 Rules + Amazon principles:
            - No unchecked return values
            - All error paths handled (no silent catch blocks)
            - No hardcoded secrets
            - Functions under 50 lines
            - No mutation of existing objects
            - No string interpolation in shell commands (use array args)
          claude_args: "--max-turns 5 --model claude-sonnet-4-6"

  security-review:
    needs: shift-left
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Container Scan
        run: docker build -t app:test . && trivy image --exit-code 1 --severity CRITICAL app:test
      - name: Secret Scan
        run: npx secretlint "**/*"

  # ──────────────────────────────────────
  # STAGE 3: Immutable Artifact Build（全品質ゲート通過後）
  # ──────────────────────────────────────
  build:
    needs: [unit-tests, claude-review, security-review]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build Immutable Artifact
        run: |
          docker build \
            --label "git.commit=${{ github.sha }}" \
            --label "build.date=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
            -t app:${{ github.sha }} .
      - name: Sign with Sigstore (SLSA L2+)
        run: cosign sign --yes app:${{ github.sha }}
      - name: Push to Registry
        run: |
          docker tag app:${{ github.sha }} ghcr.io/${{ github.repository }}:${{ github.sha }}
          docker push ghcr.io/${{ github.repository }}:${{ github.sha }}

  # ──────────────────────────────────────
  # STAGE 4: Progressive Delivery
  # ──────────────────────────────────────
  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Deploy to Staging
        run: |
          helm upgrade --install myapp ./helm \
            --set image.tag=${{ github.sha }} \
            --set environment=staging

  e2e-tests:
    needs: deploy-staging
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npx playwright test --reporter=html
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report
          path: playwright-report/

  deploy-production:
    needs: e2e-tests
    runs-on: ubuntu-latest
    environment: production   # 手動承認ゲート
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Deploy to Production (Canary)
        run: |
          helm upgrade --install myapp ./helm \
            --set image.tag=${{ github.sha }} \
            --set environment=production \
            --set canary.weight=5  # 最初は5%だけにルーティング

CLAUDE.md 品質ルール設定例

# REVIEW.md — Claude Code レビューガイドライン
# NASA Power of 10 + Amazon + Google SRE 原則準拠

## 必須チェック項目（全PR）

### NASA Power of 10 インスパイア
- [ ] 全ループに終了条件がある（無限ループリスクなし）
- [ ] 非void関数の戻り値が全てチェックされている
- [ ] catchブロックに必ずログが書かれている（サイレントエラー禁止）
- [ ] 関数は50行以内
- [ ] ファイルは800行以内

### Amazon原則
- [ ] 既存オブジェクトへの直接変更がない（イミュータビリティ）
- [ ] ハードコードされた値がない（定数・環境変数を使用）
- [ ] エラーメッセージが内部実装を露出していない

### セキュリティ
- [ ] シェルコマンドで文字列補間を使っていない（配列引数を使用）
- [ ] ハードコードされたAPIキー・シークレットがない
- [ ] 全ユーザー入力がシステム境界で検証されている

## スキップ対象
- `src/gen/` 以下の生成ファイル
- `*.lock`, `package-lock.json`
- フォーマットのみの変更

## レビュー重点ファイル
- 認証・認可関連
- 外部API呼び出し
- データベース操作
- 環境変数・設定読み込み

PART 6: Anthropic マルチエージェント設計パターン（実測値付き）

════════════════════════════════════════════════════════════════

実測パフォーマンス（Anthropic 2025年公開データ）

構成	性能	コスト	推奨用途
Sonnet 単体	ベースライン	1x	簡単なタスク
Opus 単体	+30〜40%	5x	複雑な単一タスク
Orchestrator(Opus)+Workers(Sonnet)×3	+90.2%	15x	複雑な調査・設計

重要知見: トークン使用量の80%が性能向上を説明。コスト15倍でも品質90.2%向上は「複雑なリサーチ・設計タスク」では価値がある。

カスタムサブエージェント設計（`~/.claude/agents/` 配置）

# 推奨エージェント構成
orchestrator:
  model: claude-opus-4-6       # 最高品質の統合判断
  role: "タスク分解・サブエージェント指揮・最終統合"

workers:
  - name: planner
    model: claude-sonnet-4-6
    tools: [WebSearch, Read, Glob]
    role: "実装計画・技術調査"

  - name: implementer
    model: claude-sonnet-4-6
    tools: [Read, Write, Edit, Bash]
    role: "コード実装"

  - name: tdd-guide
    model: claude-sonnet-4-6
    tools: [Write, Edit, Bash]
    role: "テスト記述・実行"

  - name: code-reviewer
    model: claude-sonnet-4-6
    tools: [Read, Grep, Glob]
    role: "コードレビュー（read-only）"

  - name: security-reviewer
    model: claude-haiku-4-5     # 軽量・高速で十分
    tools: [Read, Grep]
    role: "セキュリティ監査（read-only）"

コンテキスト汚染防止の情報フロー

ユーザーリクエスト
     ↓
Orchestrator (Opus) — タスク分解・DAG生成
     ↓
Sub-Agent 1   Sub-Agent 2   Sub-Agent 3
（独立コンテキスト）（独立コンテキスト）（独立コンテキスト）
     ↓              ↓              ↓
 500文字要約    500文字要約    500文字要約
     └──────────────┴──────────────┘
                    ↓
         Orchestratorで統合
                    ↓
              最終成果物

PART 7: EDD（評価駆動開発）実装ガイド

════════════════════════════════════════════

EDD vs TDD の違い

TDD: テストを先に書く → 固定要件に対してGreen/Redで管理
EDD: リアルタイムフィードバック + ポストデプロイモニタリング
     → 変化する運用要件に継続的に適応する評価ループ

RAGパイプライン品質ゲート実装

# production RAG quality gate（2025年エンタープライズ標準）
QUALITY_THRESHOLDS = {
    "retrieval_accuracy":  0.85,   # 適切なチャンク取得率
    "faithfulness":        0.90,   # 回答がコンテキストに忠実
    "answer_relevancy":    0.87,   # 質問への関連性
    "hallucination_rate":  0.05,   # 幻覚率（5%以下）
    "latency_p95_ms":    2000,    # P95レイテンシ
    "cost_per_query_usd":  0.01,   # クエリあたりコスト上限
}

def quality_gate(metrics: dict) -> bool:
    """品質ゲートチェック。違反があれば例外を投げる"""
    violations = [
        f"{key}: {metrics.get(key, 0)} < {threshold}"
        for key, threshold in QUALITY_THRESHOLDS.items()
        if metrics.get(key, 0) < threshold
    ]
    if violations:
        raise QualityGateError(f"品質ゲート不合格: {violations}")
    return True

PART 8: パイプライン品質スコアカード

════════════════════════════════════════

チェックリスト（完成前に全項目確認）

DORA指標 ✓

[ ] デプロイ頻度: オンデマンド対応済み
[ ] 変更リードタイム: 自動化により1時間未満を目指す
[ ] 変更失敗率: テスト自動化で5%未満を目指す
[ ] MTTR: ロールバック手順が整備されている

セキュリティ ✓

[ ] SLSA Level 2以上（Sigstore署名）
[ ] CVEスキャン（Critical ゼロ）
[ ] シークレットスキャン（ハードコード禁止）
[ ] SBOM生成（依存関係の完全追跡）

品質 ✓

[ ] テストカバレッジ 80%以上
[ ] 静的解析（Lintワーニングゼロ）
[ ] 型安全（TypeScript strict / Python mypy）
[ ] Claude Code Reviewが合格

観測可能性 ✓

[ ] メトリクス収集（Prometheus）
[ ] ログ集約（Loki/CloudWatch）
[ ] 分散トレーシング（Jaeger/OpenTelemetry）
[ ] アラート設定（SLO/SLAベース）

AI/LLM特有 ✓

[ ] Faithfulness ≥ 0.90
[ ] Hallucination Rate ≤ 5%
[ ] Evaluation Dataset 整備
[ ] Prompt版管理（LangSmith/PromptFlow）

PART 9: クイックスタートガイド

════════════════════════════════

いつでもこのパイプラインを使う手順

PART 4 のメタプロンプトを開く
[BUILD_TARGET] に「作りたいシステム」を記述
Claude Code に貼り付けて実行
STEP 0〜4.5 を自動実行（15〜30分）
QA Gate PASS 後、レポートを確認
「進めて」と入力 → 要件定義・実装へ

必要な環境変数（`.env`）

変数	必須	用途
`ANTHROPIC_API_KEY`	✅	Claude STEP 1〜4
`OPENROUTER_API_KEY`	✅	ChatGPT 5.4 QA Gate
`XAI_API_KEY`	推奨	Grok-4 omega-research
`TAVILY_API_KEY`	推奨	Tavily検索
`NEWSAPI_KEY`	推奨	ニュース収集
`BRAVE_SEARCH_API_KEY`	推奨	Brave検索
`EXA_API_KEY`	推奨	セマンティック検索
`FRED_API_KEY`	推奨	経済指標（無料）

REFERENCES（根拠ソース一覧）

════════════════════════════════

NASA / SpaceX

NASA Power of 10 Rules: https://www.perforce.com/blog/kw/NASA-rules-for-developing-safety-critical-code
NASA Software Engineering Handbook: https://swehb.nasa.gov/
SpaceX Software Development: https://www.coderskitchen.com/spacex-software-development-and-testing/

Amazon / AWS

Two-Pizza Teams: https://aws.amazon.com/executive-insights/content/amazon-two-pizza-team/
Working Backwards: https://workingbackwards.com/concepts/working-backwards-pr-faq-process/
Well-Architected Framework: https://docs.aws.amazon.com/wellarchitected/latest/operational-excellence-pillar/welcome.html
SageMaker Pipelines Best Practices: https://aws.amazon.com/blogs/machine-learning/best-practices-and-design-patterns-for-building-machine-learning-workflows-with-amazon-sagemaker-pipelines/

Google

Google SRE Book: https://sre.google/sre-book/release-engineering/
TFX Guide: https://www.tensorflow.org/tfx/guide/understanding_tfx_pipelines
MLOps with TFX + Vertex AI: https://docs.cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build

Apple

Xcode Cloud: https://developer.apple.com/xcode-cloud/
Core ML + WWDC24: https://developer.apple.com/videos/play/wwdc2024/10161/

Microsoft / GitHub

GitHub Actions Enterprise Scaling: https://wellarchitected.github.com/library/collaboration/recommendations/scaling-actions-reusability/
Azure DevOps: https://learn.microsoft.com/en-us/azure/devops/release-notes/features-timeline

Anthropic

Multi-agent Research System: https://www.anthropic.com/engineering/multi-agent-research-system
Claude Code Review: https://code.claude.com/docs/en/code-review
Claude Code GitHub Actions: https://code.claude.com/docs/en/github-actions
Prompt Engineering: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
Constitutional AI: https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback

DORA / Quality Metrics

DORA 2025 Report: https://dora.dev/research/2025/dora-report/
DORA Key Takeaways: https://getdx.com/blog/2024-dora-report/

Security Standards

SLSA Framework: https://openssf.org/projects/slsa/
OpenSSF Scorecard: https://scorecard.dev/
SonarQube AI Code Gates: https://docs.sonarsource.com/sonarqube-server/2025.1/instance-administration/analysis-functions/ai-code-assurance/quality-gates-for-ai-code

DSPy / Meta-Prompting

Stanford HAI DSPy: https://hai.stanford.edu/research/dspy-compiling-declarative-language-model-calls-into-state-of-the-art-pipelines
DSPy arXiv: https://arxiv.org/abs/2310.03714
5C Contracts: https://intuitionlabs.ai/articles/meta-prompting-automated-llm-prompt-engineering
Meta Prompting Guide: https://www.prompthub.us/blog/a-complete-guide-to-meta-prompting

MCP / Observability

MCP + OpenTelemetry: https://signoz.io/blog/mcp-observability-with-otel/
PromptFlow Tracing: https://microsoft.github.io/promptflow/how-to-guides/tracing/index.html

LLMOps

1200 Production Deployments: https://www.zenml.io/blog/what-1200-production-deployments-reveal-about-llmops-in-2025
Production RAG 2025: https://dextralabs.com/blog/production-rag-in-2025-evaluation-cicd-observability/
EDD Paper: https://arxiv.org/html/2411.13768v3

ULTIMATE PIPELINE ARCHITECTURE v3.0（世界最強パイプライン設計書）

要約

要点

ULTIMATE PIPELINE ARCHITECTURE v3.0

════════════════════════════════════════════════════════════════════

世界最強パイプライン設計書

NASA × Amazon × Apple × Google × Anthropic の知見を完全統合

98〜100点クオリティを保証するメタプロンプト & フォーマット

════════════════════════════════════════════════════════════════════

version: 3.0

created: 2026-03-18

research_basis:

- NASA/JPL Power of 10 Rules (Gerard Holzmann, 2006)

- Amazon Two-Pizza Team + Working Backwards (Jeff Bezos, 2004-)

- Google SRE Release Engineering (2016-)

- Google TFX + CI/CD/CT Three-layer (2019-)

- Anthropic Multi-agent System (90.2% improvement, 2025)

- DORA Elite Performers 2025 (182x deploy frequency)

- SLSA L3 Enterprise Standard (2025)

- DSPy Stanford HAI (25%+ improvement, 2024)

- Constitutional AI (Anthropic, 2022-)

- MCP + OpenTelemetry Integration (2025)

════════════════════════════════════════════════════════════════════

PART 1: 世界最高パイプラインの10原則

════════════════════════════════════════

なぜこの10原則が「98〜100点」を保証するのか

原則 1: Shift Left — 品質保証の前倒し

原則 2: Fail Fast — 高速失敗による反復速度向上

原則 3: Immutable Artifacts — 不変アーティファクト

原則 4: Hermetic Build — 自己完結ビルド

原則 5: Pipeline as Code — コードとしてのパイプライン

原則 6: Observability First — 観測可能性の最優先

原則 7: Ownership = Accountability — 所有と責任の一致

原則 8: DAG-based Dependency Management

原則 9: Progressive Delivery with Quality Gates

原則 10: Self-Service + Standardization の共存

PART 2: 98〜100点パイプラインの定量評価フレームワーク

════════════════════════════════════════════════════════

スコアリングモデル（5次元100点満点）

SLSA（サプライチェーン保証）レベル対応表

PART 3: 汎用パイプライン定義フォーマット（YAML Schema）

════════════════════════════════════════════════════════

PART 4: 最強メタプロンプト テンプレート

════════════════════════════════════════════

使い方

PART 5: Claude Code 専用 CI/CD パイプライン実装

════════════════════════════════════════════════════

GitHub Actions ゴールデンパイプライン

CLAUDE.md 品質ルール設定例

PART 6: Anthropic マルチエージェント設計パターン（実測値付き）

════════════════════════════════════════════════════════════════

実測パフォーマンス（Anthropic 2025年公開データ）

カスタムサブエージェント設計（~/.claude/agents/ 配置）

コンテキスト汚染防止の情報フロー

PART 7: EDD（評価駆動開発）実装ガイド

════════════════════════════════════════════

EDD vs TDD の違い

RAGパイプライン品質ゲート実装

PART 8: パイプライン品質スコアカード

════════════════════════════════════════

チェックリスト（完成前に全項目確認）

DORA指標 ✓

セキュリティ ✓

品質 ✓

観測可能性 ✓

AI/LLM特有 ✓

PART 9: クイックスタートガイド

════════════════════════════════

いつでもこのパイプラインを使う手順

必要な環境変数（.env）

REFERENCES（根拠ソース一覧）

════════════════════════════════

NASA / SpaceX

Amazon / AWS

Google

Apple

Microsoft / GitHub

Anthropic

DORA / Quality Metrics

Security Standards

PART 4: 最強メタプロンプトテンプレート

カスタムサブエージェント設計（`~/.claude/agents/` 配置）

必要な環境変数（`.env`）