What's New This Week: Updated 2026-03-07 09:16:04 UTC
Reliability and launch-readiness improvements shipped this week across preflight guards, release gating, and operator dashboards.
Open full release notes

Developers

Partner APIs, signing, and operational notes. Designed for reliable integrations and transparent reporting.

Recommended base: https://vets-coin.com

API base path: /api (example: https://vets-coin.com/api/partner/capabilities)

Note: api.vets-coin.com is planned, but requires a TLS cert that includes that subdomain.

What You Can Build

  • Claim flows for campaigns and donations with a partner key.
  • Webhook-based notifications for downstream systems.
  • Read-only transparency and distribution views for end users.

Quick Start

  1. Request a partner key from the VETS Coin team.
  2. Use the OpenAPI spec to generate a client (or integrate directly).
  3. Sign requests using the partner signing scheme described in the API guide.
  4. Start in low-volume mode, monitor errors, then scale.

API Query Presets

Saved preset queries for quick copy/run examples in the Developers hub.

Preset Request Description Actions
System Status (24h) GET /api/public/system-status/uptime?hours=24 Quick uptime check for the last 24 hours. Run
Incidents (7d) GET /api/public/system-status/incidents?hours=168&limit=30 Recent incident windows with active/resolved spans. Run
Latency Percentiles GET /api/public/latency-percentiles?hours=24 Public p50/p95/p99 latency telemetry. Run
Trust Manifest GET /trust.json Machine-readable trust controls and evidence pointers. Run

Integration Wizard + Schema Explorer

Use guided setup for your first call, then inspect endpoints/fields/examples in the explorer.

Audio Share Link Helper

Generate direct destination share links for track pages or audio files with channel-specific actions.

curl -sS "https://vets-coin.com/api/public/audio-share-links?track_url=/faq&title=VETS%20Audio&text=Open%20this%20audio%20link%20directly.&channels=x,telegram,email"
curl -sS "https://vets-coin.com/api/public/audio-share-links/validate?track_url=/faq&channels=x,email"
curl -sS "https://vets-coin.com/api/public/audio-share-links/preview?track_url=/faq&title=Audio%20Preview&text=Campaign%20Preview"
curl -sS -X POST "https://vets-coin.com/api/public/audio-share-links/validate/batch" -H "Content-Type: application/json" -d '{"defaults":{"channels":"x,email"},"items":[{"track_url":"/faq"},{"track_url":"/faq","title":"Campaign B"}]}'
curl -sS "https://vets-coin.com/api/public/audio-share-links/expand?short_url=https://vets-coin.com/s/audio/abc123"
curl -sS "https://vets-coin.com/api/public/audio-share-links/channels.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/errors.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/warnings.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/guidance.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/policy.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/health"

Quickstart Sandbox Verification

Deterministic API-key sandbox check for client bootstrap automation.

curl -sS \"https://vets-coin.com/api/public/quickstart/verify-key?api_key=sandbox_demo_key&client=cli\"

Webhook Simulator

Canned webhook payload scenarios for receiver validation and replay drills.

ScenarioEvent TypeRun
Donation Claim Created donation.claim
Donation Claim Redeemed donation.claim_redeemed
Webhook Delivery Failed webhook.delivery_failed

API Compatibility Canary

Strict-validation canary routes for early adopter testing before broad rollout.

curl -sS -X POST \"https://vets-coin.com/api/canary/echo\" -H \"Content-Type: application/json\" -d '{\"message\":\"hello\",\"request_id\":\"canary-1\"}'

Strict validation: on

SDK Starter Kits (Generated)

Download pre-generated starters built from the live OpenAPI specs.

python flask_api/scripts/generate_sdk_starters.py --out-dir flask_api/docs/sdk

OpenAPI Changelog

Baseline-to-current API diff published for integration planning and change audits.

python flask_api/scripts/generate_openapi_changelog.py --spec flask_api/docs/openapi.yaml --spec flask_api/docs/openapi-transparency.yaml --baseline-dir flask_api/docs/openapi/baseline

Auth Headers Quickref

Partner mutation requests should include these headers.

X-Partner-Key: <KEY_ID>
X-Partner-Timestamp: <UNIX_SECONDS>
X-Partner-Signature: <HMAC_SHA256_HEX>
Idempotency-Key: <UUID_OR_UNIQUE_TOKEN>
python flask_api/scripts/sign_partner_request.py --base-url https://vets-coin.com --method POST --path /api/partner/capabilities --key-id "<KEY_ID>" --secret "<SECRET>" --json '{"capability":"claims"}' --print-only

Partner Error Taxonomy

Use error_code for branching logic. Keep error for operator logs/UI.

Error Code HTTP Retry Class Client Action
rate_limited429retryableBack off; honor Retry-After and X-RateLimit-Reset.
replay_detected409retryableRegenerate nonce/idempotency key and retry once.
idempotency_replay409safe-noopTreat as duplicate success path; fetch latest state.
unauthorized401fail-fastRotate/check partner credentials and request signature inputs.
forbidden403fail-fastMissing scope; request scope upgrade or use correct key.
db_unavailable503retryableRetry with jittered backoff; open incident if persistent.
server_error500retryableRetry with capped backoff and capture request_id.

Webhook Replay & Verification

For partner webhook receivers: verify each event signature and keep replay/testing commands handy.

Incoming headers: X-Webhook-Id, X-Webhook-Event, X-Webhook-Timestamp, X-Webhook-Signature
Signature formula: hex(HMAC_SHA256(webhook_secret, f"{timestamp}.{raw_body_json}"))
Payload schema: {"event_id":"123","event_type":"salutes.credit","data":{"...event payload..."}}
Verification tip: compute HMAC against the exact raw request body string before JSON reserialization.
python flask_api/scripts/sign_partner_request.py --base-url https://vets-coin.com --method POST --path /api/partner/webhooks/42/test --key-id "<KEY_ID>" --secret "<SECRET>" --print-only
Admin replay route (admin session required): POST /admin/partners/webhook-events/<event_id>/replay

Webhook Receiver Pseudo-Handler (Flask)

Minimal receiver pattern: verify signature, block replay, and ack idempotently.

raw=request.get_data(as_text=True); ts=request.headers.get("X-Webhook-Timestamp",""); sig=request.headers.get("X-Webhook-Signature",""); expected=hmac_sha256_hex(secret, f"{ts}.{raw}"); event_id=request.headers.get("X-Webhook-Id",""); if not hmac.compare_digest(sig, expected): return {"success":False,"error":"unauthorized"}, 401; if replay_cache_seen(event_id): return {"success":True,"replayed":True}, 200; process_event_idempotently(event_id, raw); return {"success":True}, 200

Webhook Replay Cache TTL Guidance

Recommended retention window for webhook event-id dedupe keys.

Store each X-Webhook-Id in a fast replay cache for at least 24h (48h preferred for delayed retries).
Example: redis SETEX webhook:event:<event_id> 172800 1

Webhook Secret Rotation Overlap (Receiver)

During secret rotation, accept either active secret for a short overlap window, then retire old.

valid=False; for candidate in [WEBHOOK_SECRET_CURRENT, WEBHOOK_SECRET_PREVIOUS]: expected=hmac_sha256_hex(candidate, f"{ts}.{raw}"); valid = valid or hmac.compare_digest(sig, expected); if not valid: return {"success":False,"error":"unauthorized"}, 401
Rotation rule: keep previous secret for <=24h overlap, then remove it from verifier list.

Webhook Timestamp Skew Guard

Reject signatures outside a short timestamp window to reduce replay surface.

now=int(time.time()); ts=int(request.headers.get("X-Webhook-Timestamp","0")); if abs(now - ts) > 300: return {"success":False,"error":"stale_timestamp"}, 401
Clock source guidance: sync receivers with NTP/chrony so valid requests are not rejected by drift.

Webhook Event-Type Allowlist Guard

Acknowledge unknown event types without side effects to keep receiver pipelines resilient.

allowed={"salutes.credit","salutes.debit","donation.claimed"}; event_type=request.headers.get("X-Webhook-Event",""); if event_type not in allowed: log_unknown_event(event_type); return {"success":True,"ignored":True}, 200

Webhook Delivery-ID Persistence Guard

Persist webhook event IDs with a unique key so retries cannot duplicate state changes.

Schema rule: CREATE UNIQUE INDEX ux_webhook_events_event_id ON webhook_events(event_id);
Receiver rule: insert event_id before side effects; on duplicate-key return {"success":True,"duplicate":True}, 200.

Webhook Async-Ack Processing Pattern

Acknowledge quickly, process safely in background workers, and retry from queue on transient failures.

enqueue_result=queue_push({"event_id":event_id,"payload":raw}); if not enqueue_result.ok: return {"success":False,"error":"queue_unavailable"}, 503; return {"success":True,"queued":True}, 200
Worker rule: process queued event idempotently; on transient error requeue with capped backoff + dead-letter threshold.

Webhook Dead-Letter Replay Pattern

Support operator-triggered re-drive by delivery ID so failed events can be replayed safely.

Replay API sketch: POST /admin/partners/webhook-events/<event_id>/replay -> {"success":true,"event_id":"...","requeued":true}
Worker rule: before re-drive, check event_id already processed; if yes, ack duplicate and skip side effects.

Webhook Processing-State Lifecycle

Track a simple state model so dashboards and alerts can identify stuck or failing deliveries.

State path: queued -> processing -> succeeded | failed
Schema suggestion: webhook_events(event_id, state, attempts, last_error, updated_at_utc)

Webhook Retry Policy Pattern

Use bounded retries with exponential backoff to avoid hot-loop failures.

Retry schedule example (seconds): [5, 15, 60, 300, 900] with max_attempts=5 then dead-letter.
Pseudocode: delay=min(900, 5 * (2 ** (attempt-1))); attempt>=5 -> state=failed_dead_letter

Webhook Observability Metrics

Track a minimal metrics set so operators can detect reliability regressions quickly.

Core metrics: webhook_success_rate_5m, webhook_retry_rate_5m, webhook_dead_letter_count_24h.
Example formulas: success_rate = succeeded / total; retry_rate = retried / total; dead_letter_count = count(state="failed_dead_letter").

Webhook Alert Threshold Starters

Baseline thresholds to start with before tuning to real traffic patterns.

Page if success_rate_5m < 0.98 OR dead_letter_count_24h > 0 OR retry_rate_5m > 0.10.
Warn if success_rate_5m < 0.995 for 3 consecutive windows.

Webhook SQL Rollup Query (Hourly)

Use an hourly rollup query to power reliability widgets without scanning raw event logs each request.

SELECT date_trunc('hour', updated_at_utc) AS hour_utc, COUNT(*) AS total, SUM(CASE WHEN state='succeeded' THEN 1 ELSE 0 END) AS succeeded, SUM(CASE WHEN attempts > 1 THEN 1 ELSE 0 END) AS retried, SUM(CASE WHEN state='failed_dead_letter' THEN 1 ELSE 0 END) AS dead_letter FROM webhook_events WHERE updated_at_utc >= NOW() - INTERVAL '24 hours' GROUP BY 1 ORDER BY 1 DESC;

Webhook Prometheus Query Starters

Starter PromQL-style panels for success, retry, and dead-letter trend visibility.

Success rate (5m): sum(rate(vets_webhook_events_total{state="succeeded"}[5m])) / sum(rate(vets_webhook_events_total[5m]))
Retry rate (5m): sum(rate(vets_webhook_events_total{retried="true"}[5m])) / sum(rate(vets_webhook_events_total[5m]))
Dead-letter count (24h): increase(vets_webhook_events_total{state="failed_dead_letter"}[24h])

Webhook Triage Action Matrix

Map metric breaches to immediate actions so incident response is deterministic.

Signal Threshold Immediate Action
success_rate_5m < 0.98 Page on-call, inspect queue backlog and signature failures.
retry_rate_5m > 0.10 Check upstream latency/error spikes, raise worker concurrency temporarily.
dead_letter_count_24h > 0 Run dead-letter replay flow by event_id after fix validation.

Webhook Incident Timeline Pattern

Capture first/last seen timestamps so postmortems can quantify blast radius and duration.

Track fields: incident_id, first_seen_utc, last_seen_utc, duration_seconds, affected_event_count.
Duration formula: duration_seconds = EXTRACT(EPOCH FROM (last_seen_utc - first_seen_utc)).

Webhook Error-Budget Burn-Rate

Track burn-rate against your webhook success SLO to detect fast reliability erosion.

Burn-rate formula: (1 - success_rate_window) / (1 - target_slo). Example target_slo=0.999.
Action hint: burn_rate_5m > 2.0 + burn_rate_1h > 1.0 => page and gate risky deploys.

Webhook Postmortem Checklist

Use a fixed checklist to keep incident learning loops consistent and auditable.

Checklist: impact, customer scope, first_seen_utc, last_seen_utc, root_cause, corrective_action, owner, due_date_utc.
Closure rule: incident stays open until corrective action is merged, deployed, and replay validation passes.

Webhook Runbook Escalation Pattern

Define escalation ownership and update cadence before incidents happen.

Role assignment: designate Incident Commander (IC), Communications Lead, and Technical Owner at incident open.
Cadence: status updates every 15 minutes while active; trigger executive update if duration >= 60 minutes or user impact is severe.

Webhook Status-Page Messaging Pattern

Use consistent status phases so users and partners understand incident progression.

Phase order: degraded -> investigating -> monitoring -> resolved
Message template: "[phase] webhook delivery latency elevated; next update in 15 minutes."

Webhook Stakeholder Update Template

Keep partner, internal, and executive updates aligned from one structured template.

Partner update: current_status, affected_endpoints, expected_next_update_utc, workaround_available.
Internal ops update: suspected_root_cause, mitigation_progress, blockers, owner_on_point.
Executive summary: user_impact_level, ETA_confidence, decision_requests, reputational_risk_notes.

Webhook Integration-Release Checklist

Use a pre-release checklist to reduce deployment risk for partner webhook changes.

Checklist: run preflight, deploy canary partner key, monitor retry/dead-letter metrics for 30 minutes, keep rollback switch ready.
Rollback rule: if success_rate_5m drops below SLO or dead_letter_count increases, rollback immediately and replay impacted event_ids.

Webhook Key-Rotation Rollout Checklist

Rotate keys without downtime by running old/new credentials in a controlled overlap window.

Step 1: issue new key + secret and validate against sandbox/test webhook route.
Step 2: dual-run old+new key for 24h, monitor auth failures, then revoke old key immediately after stable window.
Step 3: confirm no traffic on old key_id for 15m before final revoke commit.

Webhook Signature-Version Migration Checklist

Migrate signing schemes with overlap windows and a fixed deprecation cutoff.

Migration plan: accept v1 + v2 signatures for overlap window, emit v2-only from sender, track v1 traffic decay.
Cutoff rule: publish cutoff_date_utc, alert partners 14d/7d/1d, reject v1 after cutoff with explicit upgrade error.

Webhook Payload-Schema Versioning Pattern

Version payload contracts explicitly so receivers can parse safely during schema evolution.

Envelope example: {"schema_version":"2","event_id":"...","event_type":"...","data":{...}}
Compatibility rule: keep backward parsing support for at least one release window before removing old fields.

Webhook Schema Deprecation Timeline

Publish a fixed timeline so partners can migrate before breaking schema removals.

Timeline: announce deprecation_date_utc, run dual-support window, enforce removal_date_utc.
Communication cadence: notify at T-30d, T-14d, T-7d, and T-1d with upgrade examples.

API Deprecation Calendar

Machine-readable deprecation schedule for endpoint sunset planning.

curl -sS "https://vets-coin.com/developers/deprecations.json"
curl -sS "https://vets-coin.com/developers/deprecations-playbook.md"
curl -sS "https://vets-coin.com/developers/deprecations-playbook.json"
curl -sS "https://vets-coin.com/developers/deprecations.rss"

Generated migration playbook: /developers/deprecations-playbook.md/developers/deprecations-playbook.json/developers/deprecations.rss

No active endpoint sunsets are currently scheduled.

Webhook Compatibility Test Matrix

Validate sender/receiver version combinations before changing production defaults.

Sender Version Receiver Version Expected Outcome
v1 v1 Pass (legacy baseline)
v2 v1 Pass only during dual-support window
v1 v2 Pass only during dual-support window
v2 v2 Pass (post-cutover target)

Webhook Contract-Test Checklist

Run deterministic contract tests before promoting webhook schema changes.

Checklist: required_fields_present, optional_fields_tolerated, unknown_fields_ignored, signature_verification_passes.
Gate rule: block release if any contract test fails on canary receiver fixtures.

Webhook Replay-Test Scenario

Simulate duplicate deliveries to verify idempotent receiver behavior.

Scenario: send identical payload + event_id twice within replay-cache window.
Expected: first delivery applies side effects; second returns success with duplicate/replayed indicator and no additional mutation.

Webhook Latency SLO Targets

Use percentile-based SLO targets to detect delivery-path regressions before failures spike.

Target example: P50 < 250ms, P95 < 1000ms, P99 < 3000ms for end-to-end webhook processing latency.
Alert starter: page when P95 > 1500ms for 3 consecutive 5m windows OR P99 > 5000ms in any 5m window.

Webhook Queue-Backlog SLO Targets

Track queue depth and oldest-message age so delayed processing is detected early.

Target example: queue_depth < 500 and oldest_message_age_seconds < 120 during steady state.
Alert starter: page when queue_depth > 2000 OR oldest_message_age_seconds > 600 for 10 minutes.

Webhook DLQ-Drain Runbook

Replay dead-lettered events in controlled batches to avoid reintroducing overload.

Batch strategy: replay 100 events per batch, wait 60s cooldown, then re-check latency + backlog before next batch.
Verification checks: error rate stable, queue_depth recovering, no duplicate side effects, replayed event_ids marked succeeded.

Webhook Canary-Failure Rollback

If canary delivery quality regresses, roll back quickly before broad partner impact.

Immediate action: disable canary key_id, stop new canary deliveries, and revert sender route to stable key.
Recovery action: replay canary window events (start_ts..end_ts) through stable pipeline with idempotency safeguards enabled.
Exit criteria: success_rate_5m returns above SLO, retry/dead-letter rates normalize, and canary replay backlog is fully drained.

Webhook Canary-Success Promotion Checklist

If canary quality remains healthy, promote traffic in controlled steps with rollback guardrails.

Promotion plan: 1% -> 5% -> 25% -> 50% -> 100%; hold each step for at least 15 minutes.
Gate each step on stable success_rate_5m, retry_rate_5m, dead_letter_count_24h, queue_depth, and latency percentiles.
Rollback guardrail: immediately revert to previous step if SLO breach persists for 2 consecutive 5m windows.

Webhook Rollback-Drill Cadence

Run routine rollback drills so incident response stays fast and predictable.

Cadence: run a scheduled rollback simulation at least once per month and after major webhook pipeline changes.
Drill checklist: trigger synthetic SLO breach, disable canary key, replay drill window, verify stable recovery in dashboards.
Evidence to retain: timeline timestamps, operator actions, metric screenshots, and confirmed replay completion count.

Webhook Incident Command: First 10 Minutes

Use a fixed opening sequence so critical incident actions happen immediately and in order.

Minute 0-2: assign IC + technical owner, declare incident channel, snapshot success/retry/dead-letter + backlog metrics.
Minute 2-5: decide contain action (disable canary key, pause risky rollout, cap replay) and log rationale.
Minute 5-10: publish first status update, set next update timer (15m), and open action checklist with owners.

Webhook Incident Comms Cadence

Keep predictable update clocks across audiences during active incidents.

Partner-facing updates: every 30 minutes while degraded, include affected endpoints + expected next update time.
Internal ops updates: every 15 minutes, include metric deltas, mitigation status, and current blocker owner.
Executive updates: every 60 minutes (or on major change), include user impact, risk level, and ETA confidence.

Webhook Incident Closure Checklist

Close incidents only after objective recovery verification and documented handoff.

Recovery gate: success_rate_5m above SLO for 30m, retry/dead-letter rates back to baseline, and backlog fully drained.
Data gate: replay queue empty, no unowned failed events, and incident timeline updated with final root-cause statement.
Comms gate: publish resolved update, record customer impact window, and link postmortem owner + due date.

Webhook Post-Incident Handoff Packet

Standardize handoff artifacts so follow-up work does not drift after incident closure.

Required fields: incident_id, severity, start/end_utc, affected endpoints, replay count, unresolved risks.
Action tracker: each corrective action must include owner, ETA, dependency, and verification check.
Handoff rule: schedule a 24h review checkpoint to confirm action status and detect any regression signal.

Webhook Corrective-Action Verification Ledger

Track every corrective action to completion with clear verification evidence.

Ledger columns: action_id, owner, due_date_utc, status, dependency, verification_check, verified_at_utc.
Status model: planned -> in_progress -> blocked -> verified -> closed (only close after verification evidence is linked).
Audit trail: capture changed_by + changed_at_utc on every status transition and store immutable comment history.

Webhook Dependency-Risk Register

Track upstream dependency risks so incident response includes owner, blast radius, and fallback paths.

Register fields: dependency_name, service_owner, oncall_contact, blast_radius, fallback_mode, mitigation_runbook, last_tested_utc.
Risk scoring: classify critical/high/medium by user-impact scope + single-point-of-failure likelihood.
Governance rule: run dependency failover test at least quarterly and attach evidence link to each register row.

Webhook Dependency Failover-Drill Matrix

Define expected fallback behavior and recovery targets per dependency before incidents occur.

Dependency Fallback Mode RTO Target Drill Cadence
primary_webhook_queue Switch producer to secondary queue cluster < 5m Monthly
signature_validation_store Read-through cache with strict TTL + deny-on-miss guard < 10m Quarterly
metrics_ingestion Buffer locally and backfill on recovery < 15m Quarterly
Verification: each drill must record actual_rto, fallback_result, and follow-up action if target is missed.

Webhook Dependency Alert-Routing Matrix

Map each dependency breach signal to the right pager owner and escalation path.

Signal / Breach Primary Pager Owner Escalation Path
queue_depth > 2000 for 10m Webhook Platform On-Call Escalate to Incident Commander at +10m if unresolved
signature_validation_errors_rate > 2% Security/API Auth On-Call Escalate to Security Lead + IC immediately
dead_letter_count_24h increase > threshold Reliability On-Call Escalate to Platform Manager at +15m; start replay runbook
Routing rule: each alert route must include backup owner and escalation timeout to prevent notification dead-ends.

Webhook Dependency Escalation Decision Tree

Use a deterministic branch when dependency failures require containment, failover, or replay actions.

Branch 1 (contain): if auth/signature failure rate spikes and cause is unknown, pause risky rollout and gate new mutations.
Branch 2 (failover): if primary dependency outage is confirmed and fallback is healthy, switch traffic to fallback immediately.
Branch 3 (replay): when dependency recovers, run bounded replay batches only after queue and latency SLOs are stable.
Escalation trigger: if no branch restores SLO within 15 minutes, escalate to IC + platform lead and open incident bridge.

Webhook Dependency Freeze-Threshold Policy

Define automatic mutation-freeze gates so severe dependency failures cannot cascade into larger data integrity incidents.

Freeze gate A: trigger mutation_freeze=true when signature_validation_errors_rate > 5% for 5 minutes.
Freeze gate B: trigger mutation_freeze=true when dead_letter_rate_5m > 2% and queue_depth > 3000 simultaneously.
Unfreeze rule: require 15 minutes of SLO-stable metrics plus explicit IC approval and audit-log note.

Webhook Freeze-Override Governance

Allow emergency overrides only under strict authority, dual-approval, and timed expiry controls.

Who can override: Incident Commander + Platform Lead only (no single-user override for production freeze state).
Approval model: require dual approval (ic_approved=true and platform_approved=true) before override_active=true.
Expiry rule: auto-expire override in 30 minutes unless re-approved; emit audit event on activate, renew, and expire.

Webhook Override Threshold-Exception Process

Use this compact process when freeze thresholds need a time-boxed exception during active incident response.

Approver quorum: require 2 of 3 approvals (IC, Platform Lead, Security Lead) before threshold_exception_active=true.
Expiry cap: enforce hard expiry in 30 minutes; renewal requires fresh quorum + explicit incident status update.
Audit note minimums: reason, impacted endpoints, projected risk window, rollback trigger, and owner of next review.

Webhook Override Audit-Log Schema

Use a consistent audit schema so every override lifecycle action is traceable and reviewable.

Required fields: override_id, action, actor_id, actor_role, reason_code, reason_note, expires_at_utc, state, created_at_utc.
State model: requested -> approved -> active -> renewed -> expired (or revoked).
Audit guarantees: append-only records, immutable timestamps, and link to incident_id for every override event.

Webhook Override-Review Cadence

Review active overrides on a fixed cadence so emergency controls do not drift into long-lived risk.

Daily review: list all override_active=true records, verify business justification, and confirm next expiry timestamp.
Stale-alert rule: page on-call if any override remains active > 24h or has no linked incident/update note.
Closure rule: convert active override to expired/revoked within 15 minutes after risk condition clears.

Webhook Override Emergency-Breakglass Policy

Permit single-actor emergency override only for extreme availability scenarios and force rapid expiry.

Breakglass path: allow single actor only when incident severity is critical and dual-approval path is unavailable.
Forced expiry: breakglass override expires in 10 minutes with no silent extension; renewal requires fresh explicit action.
Control rule: page IC + security lead immediately and require post-incident review note within 24 hours.

Webhook Override Revocation Protocol

Revoke overrides quickly and consistently once the risk condition clears or misuse is detected.

Revocation trigger: unauthorized use, stale override, or restored system health beyond unfreeze criteria.
Execution steps: set override_active=false, restore default freeze policy, and run rollback validation checks.
Notification rule: send revoke event to IC, security lead, and operations channel with reason + timestamp.

Webhook Override Incident-Communication Template

Use consistent messaging at override activation, update, and revocation checkpoints.

Activate message: "Override activated" + override_id + reason + forced expiry + next update time.
Update message: current risk status + remaining override time + mitigation progress + expected revoke window.
Revoke message: "Override revoked" + revoke reason + restored controls + follow-up actions owner/ETA.

Webhook Override Postmortem Addendum

Capture override-specific outcomes so postmortems include control-side effects and residual risk.

Document what override changed: policy gates bypassed, mutation paths affected, and time window active.
Residual risk section: outstanding data reconciliation, delayed replay impacts, and temporary control gaps.
Closure requirement: assign explicit owner + due date for each residual risk item before incident closure.

Webhook Override KPI Tracking

Track override usage and quality metrics so governance decisions are backed by clear trend data.

Core KPIs: override_activation_count_30d, override_avg_duration_minutes, override_stale_ratio_30d.
Derived KPI: stale ratio = stale_overrides_30d / total_overrides_30d (stale means active > 24h).
Alert starter: page governance owner if stale ratio exceeds 5% or avg duration exceeds 60 minutes for 2 weeks.

Webhook Override Trend-Review Checklist

Run a weekly governance review to turn override metrics into concrete decisions and action items.

Weekly agenda: review activation_count_30d trend, avg_duration trend, stale_ratio trend, and top incident categories.
Decision checkpoint: keep, tighten, or relax override thresholds based on two-week trend direction.
Output requirement: record decisions, owner, ETA, and expected KPI impact for each approved change.

Webhook Override Policy-Change Guardrail

Apply override policy threshold changes inside controlled windows and auto-revert quickly if reliability degrades.

Change window: apply threshold updates only during low-risk windows (Tue-Thu 14:00-18:00 UTC) and never during an active incident.
Rollback criterion: if success_rate_5m drops by > 0.5% or projected stale_ratio_30d rises above 5% within 30 minutes, rollback immediately.
Change record rule: store before/after KPI snapshots, approving owner, rollback owner, and rollback ETA in the same audit event.

Partner Mutation End-to-End (curl)

Minimal credit mutation flow: prepare body, sign headers, send request, then branch on status code.

export BASE_URL="https://vets-coin.com" KEY_ID="<KEY_ID>" SECRET="<SECRET>" PATH="/api/salutes/credit" BODY='{"user_id":"4","amount":100,"reason":"event_participation","source":"partner_portal"}'
python flask_api/scripts/sign_partner_request.py --base-url "$BASE_URL" --method POST --path "$PATH" --key-id "$KEY_ID" --secret "$SECRET" --json "$BODY" --idempotency-key "idem-$(date +%s)" --print-only
curl -sS -X POST "$BASE_URL$PATH" -H "Content-Type: application/json" -H "X-Partner-Key: <KEY_ID>" -H "X-Partner-Timestamp: <UNIX_SECONDS>" -H "X-Partner-Signature: <HMAC_SHA256_HEX>" -H "X-Partner-Nonce: <NONCE_HEX>" -H "Idempotency-Key: <UNIQUE_KEY>" --data "$BODY"
Handling: 2xx = success; 401/403 = fail fast and rotate/fix key scope; 409/429 = retry with fresh nonce+idempotency key and exponential backoff.

Partner Mutation Response Parsing (curl -w)

Use a deterministic branch to avoid treating auth/rate-limit errors as success.

HTTP_CODE=$(curl -sS -o /tmp/vets_partner_response.json -w "%{http_code}" -X POST "$BASE_URL$PATH" -H "Content-Type: application/json" -H "X-Partner-Key: <KEY_ID>" -H "X-Partner-Timestamp: <UNIX_SECONDS>" -H "X-Partner-Signature: <HMAC_SHA256_HEX>" -H "X-Partner-Nonce: <NONCE_HEX>" -H "Idempotency-Key: <UNIQUE_KEY>" --data "$BODY")
case "$HTTP_CODE" in 2*) echo "success";; 401|403) echo "fatal_auth_or_scope";; 409|429) echo "safe_retry_new_nonce_and_idempotency";; *) echo "inspect /tmp/vets_partner_response.json";; esac

Partner Error Payload Classification (jq)

Classify JSON error payloads for retry-safe vs fail-fast decisions.

ERR_CODE=$(jq -r '.error // \"unknown\"' /tmp/vets_partner_response.json)
case "$ERR_CODE" in replay_detected|idempotency_replay|rate_limited) echo "retryable";; unauthorized|forbidden) echo "fail_fast_auth_scope";; *) echo "manual_triage";; esac

Security Notes

  • Never embed partner secrets in frontend code.
  • Prefer server-to-server calls and strict IP allowlists where possible.
  • Use idempotency keys when retrying POST requests.

Rate Limit Defaults

Current runtime defaults for common integration paths.

Flow Limit Config Key
Public API endpoints 90 / minute RATE_LIMIT_PUBLIC_API_PER_MIN
Partner webhook endpoints 120 / minute RATE_LIMIT_WEBHOOK_PER_MIN
Sandbox partner keys 30 / minute PARTNER_SANDBOX_RATE_LIMIT_PER_MINUTE

Scope-to-Endpoint Matrix

Use this map when issuing partner keys and least-privilege scopes.

Scope Typical Endpoints Notes
read GET /api/salutes/balance, GET /api/partner/user-lookup, GET /api/partner/wallet-info/<wallet>, GET /api/partner/users/<id> Default read-only partner data access.
credit POST /api/salutes/credit Mutation scope; requires nonce + idempotency key.
debit POST /api/salutes/debit Mutation scope; requires nonce + idempotency key.
ledger GET /api/salutes/ledger Read-only transaction and audit history access.
donation POST /api/partner/donation-claim Donation claim trigger workflows.
users POST /api/partner/users, PATCH /api/partner/users/<id>, POST /api/partner/users/<id>/wallets Partner user lifecycle and wallet linking actions.
webhooks GET/POST /api/partner/webhooks, DELETE /api/partner/webhooks/<id>, POST /api/partner/webhooks/<id>/test Webhook endpoint management and test dispatch.
public GET /api/public-stats, GET /api/public/system-status, GET /api/transactions/latest No partner auth required.

Common 4xx/5xx Responses

Fast triage guide for partner integrations and automation hooks.

Status Typical Cause What To Do
400 Invalid payload or missing required fields. Validate request body/query values and resend.
401 Missing/invalid partner auth signature or timestamp. Re-sign request with current timestamp and correct secret.
403 Forbidden scope, disabled key, or admin-only route. Verify key status/scopes and endpoint access policy.
409 Idempotency replay or business-state conflict. Use a fresh idempotency key and re-check current state.
413 Request body exceeds API payload guardrails. Reduce payload size or split into smaller requests.
429 Rate limit exceeded. Back off with retry/jitter and reduce burst concurrency.
503 Dependency unavailable (DB/RPC) or temporary safe-mode gates. Retry with backoff; monitor `/status` and alert endpoints.
500 Unexpected server error. Capture request ID + payload hash and report for investigation.

Auth Error JSON Examples

Use these to build deterministic client error handling paths.

401 unauthorized: {"success":false,"error":"unauthorized"}
403 forbidden scope: {"success":false,"error":"forbidden"}
409 replay/idempotency: {"success":false,"error":"replay_detected"} or {"success":false,"error":"idempotency_replay"}
429 rate-limited: {"success":false,"error":"rate_limited"}

Retry/Backoff Strategy

Recommended behavior for resilient clients (especially around 429 and 503).

Retry only on 429/503/timeout; use exponential backoff with jitter (1s, 2s, 4s, 8s, max 30s).
curl -sS --retry 5 --retry-all-errors --retry-delay 1 "https://vets-coin.com/api/public/system-status"
Always send a fresh Idempotency-Key on mutation retries; never reuse a key for a different payload.

Transparency API Snippets

Copy/paste ready examples for anomaly JSON endpoints.

curl -sS "https://vets-coin.com/transparency/audit-anomalies/summary.json?run=latest"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/runs.json"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/trend.json?metric=rows&limit=200"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.json?sigs_increase_threshold_pct=50&rows_increase_threshold_pct=50"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.json?sigs_increase_threshold_pct=25&rows_increase_threshold_pct=25"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.json?sigs_increase_threshold_pct=100&rows_increase_threshold_pct=100"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.csv" -o audit_anomaly_alerts.csv
curl -sS "https://vets-coin.com/transparency/audit-anomalies/schema-registry.json"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/diff/compare.json?run_a=latest&run_b=latest&include_rows=true&row_limit=25"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/diff/compare.csv?run_a=latest&run_b=latest" -o audit_anomaly_compare.csv
curl -sS "https://vets-coin.com/status.json"
curl -sS "https://vets-coin.com/api/public/system-status"
curl -sS "https://vets-coin.com/api/public/system-status/trend?limit=288"
curl -sS "https://vets-coin.com/api/public/system-status/uptime?hours=168"
curl -sS "https://vets-coin.com/api/public/system-status/incidents?hours=168&limit=30"
curl -sS "https://vets-coin.com/api/public/deprecations/header-simulator?id=sample_deprecation&endpoint=/api/public/system-status"
curl -sS "https://vets-coin.com/developers/migration-status.json?id=sample_deprecation"

Transparency Endpoint Map

Use this quick table to choose the right endpoint for your integration task.

Endpoint Best For Output
/transparency/audit-anomalies/summary.json Dashboard headers, run health, latest compare deltas JSON
/transparency/audit-anomalies/runs.json Run selectors, sync loops, available-history discovery JSON
/transparency/audit-anomalies/trend.json Charts, run-over-run monitoring, alert trend baselines JSON
/transparency/audit-anomalies/alerts.json Threshold-based alert snapshots for automation and paging JSON
/transparency/audit-anomalies/alerts.csv Spreadsheet-friendly current alert posture and threshold context CSV
/transparency/audit-anomalies/schema-registry.json Versioned field definitions and deprecation timelines for anomaly JSON payloads JSON
/developers/deprecations.json Endpoint deprecation calendar with announce/sunset windows and migration pointers JSON
/developers/deprecations.rss RSS feed of API deprecation windows for subscriber-based reminder workflows RSS
/developers/deprecations-playbook.md Auto-generated migration playbook with per-endpoint operational checklists Markdown
/developers/deprecations-playbook.json Tooling-friendly migration playbook companion with structured checklist steps JSON
/developers/api-errors.json Error catalog with remediation notes and retry guidance by `error_code` JSON
/status.json Alias for latest system status payload (same shape as `/api/public/system-status`) JSON
/api/public/system-status Partner-facing uptime, monitor freshness, and cron automation status JSON
/api/public/system-status/trend Lightweight rolling history for uptime charts and alert trend baselines JSON
/api/public/system-status/uptime Windowed availability percentages and per-check degraded rates JSON
/api/public/system-status/incidents Resolved/active incident windows with duration and affected checks JSON
/transparency/audit-anomalies/diff/export.json Selected-vs-latest anomaly type analysis JSON
/transparency/audit-anomalies/diff/compare.json Arbitrary run-to-run reconciliation and drift checks JSON
/transparency/audit-anomalies/diff/compare.csv Spreadsheet workflows and manual audit packs CSV

Integration Checklist

  • Poll runs.json every 5-15 minutes to detect newly available complete runs.
  • Use summary.json for top-level status and run-to-latest deltas.
  • Use trend.json for charting and threshold alerts (recommend alert when bad count increases run-over-run).
  • Use alerts.json as the paging signal endpoint when your thresholds are crossed.
  • Use schema-registry.json to pin field compatibility checks before parser updates.
  • Use /api/public/system-status as a lightweight heartbeat for partner automation and uptime probes.
  • Use /api/public/system-status/trend for simple uptime trend charts and incident postmortems.
  • Use /api/public/system-status/uptime for SLO-style percentages over 24h/7d/30d windows.
  • Use /api/public/system-status/incidents for machine-readable outage windows and postmortem timelines.
  • Use diff/compare.json for machine checks and diff/compare.csv for manual reconciliation packets.
  • Cache responses for at least 60 seconds; these are audit snapshots, not per-transaction streaming endpoints.