Developers
Partner APIs, signing, and operational notes. Designed for reliable integrations and transparent reporting.
Recommended base: https://vets-coin.com
API base path: /api (example: https://vets-coin.com/api/partner/capabilities)
Note: api.vets-coin.com is planned, but requires a TLS cert that includes that subdomain.
What You Can Build
- Claim flows for campaigns and donations with a partner key.
- Webhook-based notifications for downstream systems.
- Read-only transparency and distribution views for end users.
Quick Start
- Request a partner key from the VETS Coin team.
- Use the OpenAPI spec to generate a client (or integrate directly).
- Sign requests using the partner signing scheme described in the API guide.
- Start in low-volume mode, monitor errors, then scale.
API Query Presets
Saved preset queries for quick copy/run examples in the Developers hub.
| Preset | Request | Description | Actions |
|---|---|---|---|
| System Status (24h) | GET /api/public/system-status/uptime?hours=24 | Quick uptime check for the last 24 hours. | Run |
| Incidents (7d) | GET /api/public/system-status/incidents?hours=168&limit=30 | Recent incident windows with active/resolved spans. | Run |
| Latency Percentiles | GET /api/public/latency-percentiles?hours=24 | Public p50/p95/p99 latency telemetry. | Run |
| Trust Manifest | GET /trust.json | Machine-readable trust controls and evidence pointers. | Run |
Integration Wizard + Schema Explorer
Use guided setup for your first call, then inspect endpoints/fields/examples in the explorer.
Audio Share Link Helper
Generate direct destination share links for track pages or audio files with channel-specific actions.
curl -sS "https://vets-coin.com/api/public/audio-share-links?track_url=/faq&title=VETS%20Audio&text=Open%20this%20audio%20link%20directly.&channels=x,telegram,email"
curl -sS "https://vets-coin.com/api/public/audio-share-links/validate?track_url=/faq&channels=x,email"
curl -sS "https://vets-coin.com/api/public/audio-share-links/preview?track_url=/faq&title=Audio%20Preview&text=Campaign%20Preview"
curl -sS -X POST "https://vets-coin.com/api/public/audio-share-links/validate/batch" -H "Content-Type: application/json" -d '{"defaults":{"channels":"x,email"},"items":[{"track_url":"/faq"},{"track_url":"/faq","title":"Campaign B"}]}'
curl -sS "https://vets-coin.com/api/public/audio-share-links/expand?short_url=https://vets-coin.com/s/audio/abc123"
curl -sS "https://vets-coin.com/api/public/audio-share-links/channels.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/errors.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/warnings.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/guidance.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/policy.json"
curl -sS "https://vets-coin.com/api/public/audio-share-links/health"
Quickstart Sandbox Verification
Deterministic API-key sandbox check for client bootstrap automation.
curl -sS \"https://vets-coin.com/api/public/quickstart/verify-key?api_key=sandbox_demo_key&client=cli\"
Webhook Simulator
Canned webhook payload scenarios for receiver validation and replay drills.
| Scenario | Event Type | Run |
|---|---|---|
| Donation Claim Created | donation.claim | |
| Donation Claim Redeemed | donation.claim_redeemed | |
| Webhook Delivery Failed | webhook.delivery_failed |
API Compatibility Canary
Strict-validation canary routes for early adopter testing before broad rollout.
curl -sS -X POST \"https://vets-coin.com/api/canary/echo\" -H \"Content-Type: application/json\" -d '{\"message\":\"hello\",\"request_id\":\"canary-1\"}'
Strict validation: on
SDK Starter Kits (Generated)
Download pre-generated starters built from the live OpenAPI specs.
python flask_api/scripts/generate_sdk_starters.py --out-dir flask_api/docs/sdk
OpenAPI Changelog
Baseline-to-current API diff published for integration planning and change audits.
python flask_api/scripts/generate_openapi_changelog.py --spec flask_api/docs/openapi.yaml --spec flask_api/docs/openapi-transparency.yaml --baseline-dir flask_api/docs/openapi/baseline
Auth Headers Quickref
Partner mutation requests should include these headers.
X-Partner-Key: <KEY_ID>
X-Partner-Timestamp: <UNIX_SECONDS>
X-Partner-Signature: <HMAC_SHA256_HEX>
Idempotency-Key: <UUID_OR_UNIQUE_TOKEN>
python flask_api/scripts/sign_partner_request.py --base-url https://vets-coin.com --method POST --path /api/partner/capabilities --key-id "<KEY_ID>" --secret "<SECRET>" --json '{"capability":"claims"}' --print-only
Partner Error Taxonomy
Use error_code for branching logic. Keep error for operator logs/UI.
| Error Code | HTTP | Retry Class | Client Action |
|---|---|---|---|
| rate_limited | 429 | retryable | Back off; honor Retry-After and X-RateLimit-Reset. |
| replay_detected | 409 | retryable | Regenerate nonce/idempotency key and retry once. |
| idempotency_replay | 409 | safe-noop | Treat as duplicate success path; fetch latest state. |
| unauthorized | 401 | fail-fast | Rotate/check partner credentials and request signature inputs. |
| forbidden | 403 | fail-fast | Missing scope; request scope upgrade or use correct key. |
| db_unavailable | 503 | retryable | Retry with jittered backoff; open incident if persistent. |
| server_error | 500 | retryable | Retry with capped backoff and capture request_id. |
Webhook Replay & Verification
For partner webhook receivers: verify each event signature and keep replay/testing commands handy.
Incoming headers: X-Webhook-Id, X-Webhook-Event, X-Webhook-Timestamp, X-Webhook-Signature
Signature formula: hex(HMAC_SHA256(webhook_secret, f"{timestamp}.{raw_body_json}"))
Payload schema: {"event_id":"123","event_type":"salutes.credit","data":{"...event payload..."}}
Verification tip: compute HMAC against the exact raw request body string before JSON reserialization.
python flask_api/scripts/sign_partner_request.py --base-url https://vets-coin.com --method POST --path /api/partner/webhooks/42/test --key-id "<KEY_ID>" --secret "<SECRET>" --print-only
Admin replay route (admin session required): POST /admin/partners/webhook-events/<event_id>/replay
Webhook Receiver Pseudo-Handler (Flask)
Minimal receiver pattern: verify signature, block replay, and ack idempotently.
raw=request.get_data(as_text=True); ts=request.headers.get("X-Webhook-Timestamp",""); sig=request.headers.get("X-Webhook-Signature",""); expected=hmac_sha256_hex(secret, f"{ts}.{raw}"); event_id=request.headers.get("X-Webhook-Id",""); if not hmac.compare_digest(sig, expected): return {"success":False,"error":"unauthorized"}, 401; if replay_cache_seen(event_id): return {"success":True,"replayed":True}, 200; process_event_idempotently(event_id, raw); return {"success":True}, 200
Webhook Replay Cache TTL Guidance
Recommended retention window for webhook event-id dedupe keys.
Store each X-Webhook-Id in a fast replay cache for at least 24h (48h preferred for delayed retries).
Example: redis SETEX webhook:event:<event_id> 172800 1
Webhook Secret Rotation Overlap (Receiver)
During secret rotation, accept either active secret for a short overlap window, then retire old.
valid=False; for candidate in [WEBHOOK_SECRET_CURRENT, WEBHOOK_SECRET_PREVIOUS]: expected=hmac_sha256_hex(candidate, f"{ts}.{raw}"); valid = valid or hmac.compare_digest(sig, expected); if not valid: return {"success":False,"error":"unauthorized"}, 401
Rotation rule: keep previous secret for <=24h overlap, then remove it from verifier list.
Webhook Timestamp Skew Guard
Reject signatures outside a short timestamp window to reduce replay surface.
now=int(time.time()); ts=int(request.headers.get("X-Webhook-Timestamp","0")); if abs(now - ts) > 300: return {"success":False,"error":"stale_timestamp"}, 401
Clock source guidance: sync receivers with NTP/chrony so valid requests are not rejected by drift.
Webhook Event-Type Allowlist Guard
Acknowledge unknown event types without side effects to keep receiver pipelines resilient.
allowed={"salutes.credit","salutes.debit","donation.claimed"}; event_type=request.headers.get("X-Webhook-Event",""); if event_type not in allowed: log_unknown_event(event_type); return {"success":True,"ignored":True}, 200
Webhook Delivery-ID Persistence Guard
Persist webhook event IDs with a unique key so retries cannot duplicate state changes.
Schema rule: CREATE UNIQUE INDEX ux_webhook_events_event_id ON webhook_events(event_id);
Receiver rule: insert event_id before side effects; on duplicate-key return {"success":True,"duplicate":True}, 200.
Webhook Async-Ack Processing Pattern
Acknowledge quickly, process safely in background workers, and retry from queue on transient failures.
enqueue_result=queue_push({"event_id":event_id,"payload":raw}); if not enqueue_result.ok: return {"success":False,"error":"queue_unavailable"}, 503; return {"success":True,"queued":True}, 200
Worker rule: process queued event idempotently; on transient error requeue with capped backoff + dead-letter threshold.
Webhook Dead-Letter Replay Pattern
Support operator-triggered re-drive by delivery ID so failed events can be replayed safely.
Replay API sketch: POST /admin/partners/webhook-events/<event_id>/replay -> {"success":true,"event_id":"...","requeued":true}
Worker rule: before re-drive, check event_id already processed; if yes, ack duplicate and skip side effects.
Webhook Processing-State Lifecycle
Track a simple state model so dashboards and alerts can identify stuck or failing deliveries.
State path: queued -> processing -> succeeded | failed
Schema suggestion: webhook_events(event_id, state, attempts, last_error, updated_at_utc)
Webhook Retry Policy Pattern
Use bounded retries with exponential backoff to avoid hot-loop failures.
Retry schedule example (seconds): [5, 15, 60, 300, 900] with max_attempts=5 then dead-letter.
Pseudocode: delay=min(900, 5 * (2 ** (attempt-1))); attempt>=5 -> state=failed_dead_letter
Webhook Observability Metrics
Track a minimal metrics set so operators can detect reliability regressions quickly.
Core metrics: webhook_success_rate_5m, webhook_retry_rate_5m, webhook_dead_letter_count_24h.
Example formulas: success_rate = succeeded / total; retry_rate = retried / total; dead_letter_count = count(state="failed_dead_letter").
Webhook Alert Threshold Starters
Baseline thresholds to start with before tuning to real traffic patterns.
Page if success_rate_5m < 0.98 OR dead_letter_count_24h > 0 OR retry_rate_5m > 0.10.
Warn if success_rate_5m < 0.995 for 3 consecutive windows.
Webhook SQL Rollup Query (Hourly)
Use an hourly rollup query to power reliability widgets without scanning raw event logs each request.
SELECT date_trunc('hour', updated_at_utc) AS hour_utc, COUNT(*) AS total, SUM(CASE WHEN state='succeeded' THEN 1 ELSE 0 END) AS succeeded, SUM(CASE WHEN attempts > 1 THEN 1 ELSE 0 END) AS retried, SUM(CASE WHEN state='failed_dead_letter' THEN 1 ELSE 0 END) AS dead_letter FROM webhook_events WHERE updated_at_utc >= NOW() - INTERVAL '24 hours' GROUP BY 1 ORDER BY 1 DESC;
Webhook Prometheus Query Starters
Starter PromQL-style panels for success, retry, and dead-letter trend visibility.
Success rate (5m): sum(rate(vets_webhook_events_total{state="succeeded"}[5m])) / sum(rate(vets_webhook_events_total[5m]))
Retry rate (5m): sum(rate(vets_webhook_events_total{retried="true"}[5m])) / sum(rate(vets_webhook_events_total[5m]))
Dead-letter count (24h): increase(vets_webhook_events_total{state="failed_dead_letter"}[24h])
Webhook Triage Action Matrix
Map metric breaches to immediate actions so incident response is deterministic.
| Signal | Threshold | Immediate Action |
|---|---|---|
| success_rate_5m | < 0.98 | Page on-call, inspect queue backlog and signature failures. |
| retry_rate_5m | > 0.10 | Check upstream latency/error spikes, raise worker concurrency temporarily. |
| dead_letter_count_24h | > 0 | Run dead-letter replay flow by event_id after fix validation. |
Webhook Incident Timeline Pattern
Capture first/last seen timestamps so postmortems can quantify blast radius and duration.
Track fields: incident_id, first_seen_utc, last_seen_utc, duration_seconds, affected_event_count.
Duration formula: duration_seconds = EXTRACT(EPOCH FROM (last_seen_utc - first_seen_utc)).
Webhook Error-Budget Burn-Rate
Track burn-rate against your webhook success SLO to detect fast reliability erosion.
Burn-rate formula: (1 - success_rate_window) / (1 - target_slo). Example target_slo=0.999.
Action hint: burn_rate_5m > 2.0 + burn_rate_1h > 1.0 => page and gate risky deploys.
Webhook Postmortem Checklist
Use a fixed checklist to keep incident learning loops consistent and auditable.
Checklist: impact, customer scope, first_seen_utc, last_seen_utc, root_cause, corrective_action, owner, due_date_utc.
Closure rule: incident stays open until corrective action is merged, deployed, and replay validation passes.
Webhook Runbook Escalation Pattern
Define escalation ownership and update cadence before incidents happen.
Role assignment: designate Incident Commander (IC), Communications Lead, and Technical Owner at incident open.
Cadence: status updates every 15 minutes while active; trigger executive update if duration >= 60 minutes or user impact is severe.
Webhook Status-Page Messaging Pattern
Use consistent status phases so users and partners understand incident progression.
Phase order: degraded -> investigating -> monitoring -> resolved
Message template: "[phase] webhook delivery latency elevated; next update in 15 minutes."
Webhook Stakeholder Update Template
Keep partner, internal, and executive updates aligned from one structured template.
Partner update: current_status, affected_endpoints, expected_next_update_utc, workaround_available.
Internal ops update: suspected_root_cause, mitigation_progress, blockers, owner_on_point.
Executive summary: user_impact_level, ETA_confidence, decision_requests, reputational_risk_notes.
Webhook Integration-Release Checklist
Use a pre-release checklist to reduce deployment risk for partner webhook changes.
Checklist: run preflight, deploy canary partner key, monitor retry/dead-letter metrics for 30 minutes, keep rollback switch ready.
Rollback rule: if success_rate_5m drops below SLO or dead_letter_count increases, rollback immediately and replay impacted event_ids.
Webhook Key-Rotation Rollout Checklist
Rotate keys without downtime by running old/new credentials in a controlled overlap window.
Step 1: issue new key + secret and validate against sandbox/test webhook route.
Step 2: dual-run old+new key for 24h, monitor auth failures, then revoke old key immediately after stable window.
Step 3: confirm no traffic on old key_id for 15m before final revoke commit.
Webhook Signature-Version Migration Checklist
Migrate signing schemes with overlap windows and a fixed deprecation cutoff.
Migration plan: accept v1 + v2 signatures for overlap window, emit v2-only from sender, track v1 traffic decay.
Cutoff rule: publish cutoff_date_utc, alert partners 14d/7d/1d, reject v1 after cutoff with explicit upgrade error.
Webhook Payload-Schema Versioning Pattern
Version payload contracts explicitly so receivers can parse safely during schema evolution.
Envelope example: {"schema_version":"2","event_id":"...","event_type":"...","data":{...}}
Compatibility rule: keep backward parsing support for at least one release window before removing old fields.
Webhook Schema Deprecation Timeline
Publish a fixed timeline so partners can migrate before breaking schema removals.
Timeline: announce deprecation_date_utc, run dual-support window, enforce removal_date_utc.
Communication cadence: notify at T-30d, T-14d, T-7d, and T-1d with upgrade examples.
API Deprecation Calendar
Machine-readable deprecation schedule for endpoint sunset planning.
curl -sS "https://vets-coin.com/developers/deprecations.json"
curl -sS "https://vets-coin.com/developers/deprecations-playbook.md"
curl -sS "https://vets-coin.com/developers/deprecations-playbook.json"
curl -sS "https://vets-coin.com/developers/deprecations.rss"
Generated migration playbook: /developers/deprecations-playbook.md • /developers/deprecations-playbook.json • /developers/deprecations.rss
No active endpoint sunsets are currently scheduled.
Webhook Compatibility Test Matrix
Validate sender/receiver version combinations before changing production defaults.
| Sender Version | Receiver Version | Expected Outcome |
|---|---|---|
| v1 | v1 | Pass (legacy baseline) |
| v2 | v1 | Pass only during dual-support window |
| v1 | v2 | Pass only during dual-support window |
| v2 | v2 | Pass (post-cutover target) |
Webhook Contract-Test Checklist
Run deterministic contract tests before promoting webhook schema changes.
Checklist: required_fields_present, optional_fields_tolerated, unknown_fields_ignored, signature_verification_passes.
Gate rule: block release if any contract test fails on canary receiver fixtures.
Webhook Replay-Test Scenario
Simulate duplicate deliveries to verify idempotent receiver behavior.
Scenario: send identical payload + event_id twice within replay-cache window.
Expected: first delivery applies side effects; second returns success with duplicate/replayed indicator and no additional mutation.
Webhook Latency SLO Targets
Use percentile-based SLO targets to detect delivery-path regressions before failures spike.
Target example: P50 < 250ms, P95 < 1000ms, P99 < 3000ms for end-to-end webhook processing latency.
Alert starter: page when P95 > 1500ms for 3 consecutive 5m windows OR P99 > 5000ms in any 5m window.
Webhook Queue-Backlog SLO Targets
Track queue depth and oldest-message age so delayed processing is detected early.
Target example: queue_depth < 500 and oldest_message_age_seconds < 120 during steady state.
Alert starter: page when queue_depth > 2000 OR oldest_message_age_seconds > 600 for 10 minutes.
Webhook DLQ-Drain Runbook
Replay dead-lettered events in controlled batches to avoid reintroducing overload.
Batch strategy: replay 100 events per batch, wait 60s cooldown, then re-check latency + backlog before next batch.
Verification checks: error rate stable, queue_depth recovering, no duplicate side effects, replayed event_ids marked succeeded.
Webhook Canary-Failure Rollback
If canary delivery quality regresses, roll back quickly before broad partner impact.
Immediate action: disable canary key_id, stop new canary deliveries, and revert sender route to stable key.
Recovery action: replay canary window events (start_ts..end_ts) through stable pipeline with idempotency safeguards enabled.
Exit criteria: success_rate_5m returns above SLO, retry/dead-letter rates normalize, and canary replay backlog is fully drained.
Webhook Canary-Success Promotion Checklist
If canary quality remains healthy, promote traffic in controlled steps with rollback guardrails.
Promotion plan: 1% -> 5% -> 25% -> 50% -> 100%; hold each step for at least 15 minutes.
Gate each step on stable success_rate_5m, retry_rate_5m, dead_letter_count_24h, queue_depth, and latency percentiles.
Rollback guardrail: immediately revert to previous step if SLO breach persists for 2 consecutive 5m windows.
Webhook Rollback-Drill Cadence
Run routine rollback drills so incident response stays fast and predictable.
Cadence: run a scheduled rollback simulation at least once per month and after major webhook pipeline changes.
Drill checklist: trigger synthetic SLO breach, disable canary key, replay drill window, verify stable recovery in dashboards.
Evidence to retain: timeline timestamps, operator actions, metric screenshots, and confirmed replay completion count.
Webhook Incident Command: First 10 Minutes
Use a fixed opening sequence so critical incident actions happen immediately and in order.
Minute 0-2: assign IC + technical owner, declare incident channel, snapshot success/retry/dead-letter + backlog metrics.
Minute 2-5: decide contain action (disable canary key, pause risky rollout, cap replay) and log rationale.
Minute 5-10: publish first status update, set next update timer (15m), and open action checklist with owners.
Webhook Incident Comms Cadence
Keep predictable update clocks across audiences during active incidents.
Partner-facing updates: every 30 minutes while degraded, include affected endpoints + expected next update time.
Internal ops updates: every 15 minutes, include metric deltas, mitigation status, and current blocker owner.
Executive updates: every 60 minutes (or on major change), include user impact, risk level, and ETA confidence.
Webhook Incident Closure Checklist
Close incidents only after objective recovery verification and documented handoff.
Recovery gate: success_rate_5m above SLO for 30m, retry/dead-letter rates back to baseline, and backlog fully drained.
Data gate: replay queue empty, no unowned failed events, and incident timeline updated with final root-cause statement.
Comms gate: publish resolved update, record customer impact window, and link postmortem owner + due date.
Webhook Post-Incident Handoff Packet
Standardize handoff artifacts so follow-up work does not drift after incident closure.
Required fields: incident_id, severity, start/end_utc, affected endpoints, replay count, unresolved risks.
Action tracker: each corrective action must include owner, ETA, dependency, and verification check.
Handoff rule: schedule a 24h review checkpoint to confirm action status and detect any regression signal.
Webhook Corrective-Action Verification Ledger
Track every corrective action to completion with clear verification evidence.
Ledger columns: action_id, owner, due_date_utc, status, dependency, verification_check, verified_at_utc.
Status model: planned -> in_progress -> blocked -> verified -> closed (only close after verification evidence is linked).
Audit trail: capture changed_by + changed_at_utc on every status transition and store immutable comment history.
Webhook Dependency-Risk Register
Track upstream dependency risks so incident response includes owner, blast radius, and fallback paths.
Register fields: dependency_name, service_owner, oncall_contact, blast_radius, fallback_mode, mitigation_runbook, last_tested_utc.
Risk scoring: classify critical/high/medium by user-impact scope + single-point-of-failure likelihood.
Governance rule: run dependency failover test at least quarterly and attach evidence link to each register row.
Webhook Dependency Failover-Drill Matrix
Define expected fallback behavior and recovery targets per dependency before incidents occur.
| Dependency | Fallback Mode | RTO Target | Drill Cadence |
|---|---|---|---|
| primary_webhook_queue | Switch producer to secondary queue cluster | < 5m | Monthly |
| signature_validation_store | Read-through cache with strict TTL + deny-on-miss guard | < 10m | Quarterly |
| metrics_ingestion | Buffer locally and backfill on recovery | < 15m | Quarterly |
Verification: each drill must record actual_rto, fallback_result, and follow-up action if target is missed.
Webhook Dependency Alert-Routing Matrix
Map each dependency breach signal to the right pager owner and escalation path.
| Signal / Breach | Primary Pager Owner | Escalation Path |
|---|---|---|
| queue_depth > 2000 for 10m | Webhook Platform On-Call | Escalate to Incident Commander at +10m if unresolved |
| signature_validation_errors_rate > 2% | Security/API Auth On-Call | Escalate to Security Lead + IC immediately |
| dead_letter_count_24h increase > threshold | Reliability On-Call | Escalate to Platform Manager at +15m; start replay runbook |
Routing rule: each alert route must include backup owner and escalation timeout to prevent notification dead-ends.
Webhook Dependency Escalation Decision Tree
Use a deterministic branch when dependency failures require containment, failover, or replay actions.
Branch 1 (contain): if auth/signature failure rate spikes and cause is unknown, pause risky rollout and gate new mutations.
Branch 2 (failover): if primary dependency outage is confirmed and fallback is healthy, switch traffic to fallback immediately.
Branch 3 (replay): when dependency recovers, run bounded replay batches only after queue and latency SLOs are stable.
Escalation trigger: if no branch restores SLO within 15 minutes, escalate to IC + platform lead and open incident bridge.
Webhook Dependency Freeze-Threshold Policy
Define automatic mutation-freeze gates so severe dependency failures cannot cascade into larger data integrity incidents.
Freeze gate A: trigger mutation_freeze=true when signature_validation_errors_rate > 5% for 5 minutes.
Freeze gate B: trigger mutation_freeze=true when dead_letter_rate_5m > 2% and queue_depth > 3000 simultaneously.
Unfreeze rule: require 15 minutes of SLO-stable metrics plus explicit IC approval and audit-log note.
Webhook Freeze-Override Governance
Allow emergency overrides only under strict authority, dual-approval, and timed expiry controls.
Who can override: Incident Commander + Platform Lead only (no single-user override for production freeze state).
Approval model: require dual approval (ic_approved=true and platform_approved=true) before override_active=true.
Expiry rule: auto-expire override in 30 minutes unless re-approved; emit audit event on activate, renew, and expire.
Webhook Override Threshold-Exception Process
Use this compact process when freeze thresholds need a time-boxed exception during active incident response.
Approver quorum: require 2 of 3 approvals (IC, Platform Lead, Security Lead) before threshold_exception_active=true.
Expiry cap: enforce hard expiry in 30 minutes; renewal requires fresh quorum + explicit incident status update.
Audit note minimums: reason, impacted endpoints, projected risk window, rollback trigger, and owner of next review.
Webhook Override Audit-Log Schema
Use a consistent audit schema so every override lifecycle action is traceable and reviewable.
Required fields: override_id, action, actor_id, actor_role, reason_code, reason_note, expires_at_utc, state, created_at_utc.
State model: requested -> approved -> active -> renewed -> expired (or revoked).
Audit guarantees: append-only records, immutable timestamps, and link to incident_id for every override event.
Webhook Override-Review Cadence
Review active overrides on a fixed cadence so emergency controls do not drift into long-lived risk.
Daily review: list all override_active=true records, verify business justification, and confirm next expiry timestamp.
Stale-alert rule: page on-call if any override remains active > 24h or has no linked incident/update note.
Closure rule: convert active override to expired/revoked within 15 minutes after risk condition clears.
Webhook Override Emergency-Breakglass Policy
Permit single-actor emergency override only for extreme availability scenarios and force rapid expiry.
Breakglass path: allow single actor only when incident severity is critical and dual-approval path is unavailable.
Forced expiry: breakglass override expires in 10 minutes with no silent extension; renewal requires fresh explicit action.
Control rule: page IC + security lead immediately and require post-incident review note within 24 hours.
Webhook Override Revocation Protocol
Revoke overrides quickly and consistently once the risk condition clears or misuse is detected.
Revocation trigger: unauthorized use, stale override, or restored system health beyond unfreeze criteria.
Execution steps: set override_active=false, restore default freeze policy, and run rollback validation checks.
Notification rule: send revoke event to IC, security lead, and operations channel with reason + timestamp.
Webhook Override Incident-Communication Template
Use consistent messaging at override activation, update, and revocation checkpoints.
Activate message: "Override activated" + override_id + reason + forced expiry + next update time.
Update message: current risk status + remaining override time + mitigation progress + expected revoke window.
Revoke message: "Override revoked" + revoke reason + restored controls + follow-up actions owner/ETA.
Webhook Override Postmortem Addendum
Capture override-specific outcomes so postmortems include control-side effects and residual risk.
Document what override changed: policy gates bypassed, mutation paths affected, and time window active.
Residual risk section: outstanding data reconciliation, delayed replay impacts, and temporary control gaps.
Closure requirement: assign explicit owner + due date for each residual risk item before incident closure.
Webhook Override KPI Tracking
Track override usage and quality metrics so governance decisions are backed by clear trend data.
Core KPIs: override_activation_count_30d, override_avg_duration_minutes, override_stale_ratio_30d.
Derived KPI: stale ratio = stale_overrides_30d / total_overrides_30d (stale means active > 24h).
Alert starter: page governance owner if stale ratio exceeds 5% or avg duration exceeds 60 minutes for 2 weeks.
Webhook Override Trend-Review Checklist
Run a weekly governance review to turn override metrics into concrete decisions and action items.
Weekly agenda: review activation_count_30d trend, avg_duration trend, stale_ratio trend, and top incident categories.
Decision checkpoint: keep, tighten, or relax override thresholds based on two-week trend direction.
Output requirement: record decisions, owner, ETA, and expected KPI impact for each approved change.
Webhook Override Policy-Change Guardrail
Apply override policy threshold changes inside controlled windows and auto-revert quickly if reliability degrades.
Change window: apply threshold updates only during low-risk windows (Tue-Thu 14:00-18:00 UTC) and never during an active incident.
Rollback criterion: if success_rate_5m drops by > 0.5% or projected stale_ratio_30d rises above 5% within 30 minutes, rollback immediately.
Change record rule: store before/after KPI snapshots, approving owner, rollback owner, and rollback ETA in the same audit event.
Partner Mutation End-to-End (curl)
Minimal credit mutation flow: prepare body, sign headers, send request, then branch on status code.
export BASE_URL="https://vets-coin.com" KEY_ID="<KEY_ID>" SECRET="<SECRET>" PATH="/api/salutes/credit" BODY='{"user_id":"4","amount":100,"reason":"event_participation","source":"partner_portal"}'
python flask_api/scripts/sign_partner_request.py --base-url "$BASE_URL" --method POST --path "$PATH" --key-id "$KEY_ID" --secret "$SECRET" --json "$BODY" --idempotency-key "idem-$(date +%s)" --print-only
curl -sS -X POST "$BASE_URL$PATH" -H "Content-Type: application/json" -H "X-Partner-Key: <KEY_ID>" -H "X-Partner-Timestamp: <UNIX_SECONDS>" -H "X-Partner-Signature: <HMAC_SHA256_HEX>" -H "X-Partner-Nonce: <NONCE_HEX>" -H "Idempotency-Key: <UNIQUE_KEY>" --data "$BODY"
Handling: 2xx = success; 401/403 = fail fast and rotate/fix key scope; 409/429 = retry with fresh nonce+idempotency key and exponential backoff.
Partner Mutation Response Parsing (curl -w)
Use a deterministic branch to avoid treating auth/rate-limit errors as success.
HTTP_CODE=$(curl -sS -o /tmp/vets_partner_response.json -w "%{http_code}" -X POST "$BASE_URL$PATH" -H "Content-Type: application/json" -H "X-Partner-Key: <KEY_ID>" -H "X-Partner-Timestamp: <UNIX_SECONDS>" -H "X-Partner-Signature: <HMAC_SHA256_HEX>" -H "X-Partner-Nonce: <NONCE_HEX>" -H "Idempotency-Key: <UNIQUE_KEY>" --data "$BODY")
case "$HTTP_CODE" in 2*) echo "success";; 401|403) echo "fatal_auth_or_scope";; 409|429) echo "safe_retry_new_nonce_and_idempotency";; *) echo "inspect /tmp/vets_partner_response.json";; esac
Partner Error Payload Classification (jq)
Classify JSON error payloads for retry-safe vs fail-fast decisions.
ERR_CODE=$(jq -r '.error // \"unknown\"' /tmp/vets_partner_response.json)
case "$ERR_CODE" in replay_detected|idempotency_replay|rate_limited) echo "retryable";; unauthorized|forbidden) echo "fail_fast_auth_scope";; *) echo "manual_triage";; esac
Security Notes
- Never embed partner secrets in frontend code.
- Prefer server-to-server calls and strict IP allowlists where possible.
- Use idempotency keys when retrying POST requests.
Rate Limit Defaults
Current runtime defaults for common integration paths.
| Flow | Limit | Config Key |
|---|---|---|
| Public API endpoints | 90 / minute | RATE_LIMIT_PUBLIC_API_PER_MIN |
| Partner webhook endpoints | 120 / minute | RATE_LIMIT_WEBHOOK_PER_MIN |
| Sandbox partner keys | 30 / minute | PARTNER_SANDBOX_RATE_LIMIT_PER_MINUTE |
Scope-to-Endpoint Matrix
Use this map when issuing partner keys and least-privilege scopes.
| Scope | Typical Endpoints | Notes |
|---|---|---|
| read | GET /api/salutes/balance, GET /api/partner/user-lookup, GET /api/partner/wallet-info/<wallet>, GET /api/partner/users/<id> | Default read-only partner data access. |
| credit | POST /api/salutes/credit | Mutation scope; requires nonce + idempotency key. |
| debit | POST /api/salutes/debit | Mutation scope; requires nonce + idempotency key. |
| ledger | GET /api/salutes/ledger | Read-only transaction and audit history access. |
| donation | POST /api/partner/donation-claim | Donation claim trigger workflows. |
| users | POST /api/partner/users, PATCH /api/partner/users/<id>, POST /api/partner/users/<id>/wallets | Partner user lifecycle and wallet linking actions. |
| webhooks | GET/POST /api/partner/webhooks, DELETE /api/partner/webhooks/<id>, POST /api/partner/webhooks/<id>/test | Webhook endpoint management and test dispatch. |
| public | GET /api/public-stats, GET /api/public/system-status, GET /api/transactions/latest | No partner auth required. |
Common 4xx/5xx Responses
Fast triage guide for partner integrations and automation hooks.
| Status | Typical Cause | What To Do |
|---|---|---|
| 400 | Invalid payload or missing required fields. | Validate request body/query values and resend. |
| 401 | Missing/invalid partner auth signature or timestamp. | Re-sign request with current timestamp and correct secret. |
| 403 | Forbidden scope, disabled key, or admin-only route. | Verify key status/scopes and endpoint access policy. |
| 409 | Idempotency replay or business-state conflict. | Use a fresh idempotency key and re-check current state. |
| 413 | Request body exceeds API payload guardrails. | Reduce payload size or split into smaller requests. |
| 429 | Rate limit exceeded. | Back off with retry/jitter and reduce burst concurrency. |
| 503 | Dependency unavailable (DB/RPC) or temporary safe-mode gates. | Retry with backoff; monitor `/status` and alert endpoints. |
| 500 | Unexpected server error. | Capture request ID + payload hash and report for investigation. |
Auth Error JSON Examples
Use these to build deterministic client error handling paths.
401 unauthorized: {"success":false,"error":"unauthorized"}
403 forbidden scope: {"success":false,"error":"forbidden"}
409 replay/idempotency: {"success":false,"error":"replay_detected"} or {"success":false,"error":"idempotency_replay"}
429 rate-limited: {"success":false,"error":"rate_limited"}
Retry/Backoff Strategy
Recommended behavior for resilient clients (especially around 429 and 503).
Retry only on 429/503/timeout; use exponential backoff with jitter (1s, 2s, 4s, 8s, max 30s).
curl -sS --retry 5 --retry-all-errors --retry-delay 1 "https://vets-coin.com/api/public/system-status"
Always send a fresh Idempotency-Key on mutation retries; never reuse a key for a different payload.
Related Pages
Transparency API Snippets
Copy/paste ready examples for anomaly JSON endpoints.
curl -sS "https://vets-coin.com/transparency/audit-anomalies/summary.json?run=latest"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/runs.json"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/trend.json?metric=rows&limit=200"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.json?sigs_increase_threshold_pct=50&rows_increase_threshold_pct=50"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.json?sigs_increase_threshold_pct=25&rows_increase_threshold_pct=25"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.json?sigs_increase_threshold_pct=100&rows_increase_threshold_pct=100"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/alerts.csv" -o audit_anomaly_alerts.csv
curl -sS "https://vets-coin.com/transparency/audit-anomalies/schema-registry.json"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/diff/compare.json?run_a=latest&run_b=latest&include_rows=true&row_limit=25"
curl -sS "https://vets-coin.com/transparency/audit-anomalies/diff/compare.csv?run_a=latest&run_b=latest" -o audit_anomaly_compare.csv
curl -sS "https://vets-coin.com/status.json"
curl -sS "https://vets-coin.com/api/public/system-status"
curl -sS "https://vets-coin.com/api/public/system-status/trend?limit=288"
curl -sS "https://vets-coin.com/api/public/system-status/uptime?hours=168"
curl -sS "https://vets-coin.com/api/public/system-status/incidents?hours=168&limit=30"
curl -sS "https://vets-coin.com/api/public/deprecations/header-simulator?id=sample_deprecation&endpoint=/api/public/system-status"
curl -sS "https://vets-coin.com/developers/migration-status.json?id=sample_deprecation"
Transparency Endpoint Map
Use this quick table to choose the right endpoint for your integration task.
| Endpoint | Best For | Output |
|---|---|---|
| /transparency/audit-anomalies/summary.json | Dashboard headers, run health, latest compare deltas | JSON |
| /transparency/audit-anomalies/runs.json | Run selectors, sync loops, available-history discovery | JSON |
| /transparency/audit-anomalies/trend.json | Charts, run-over-run monitoring, alert trend baselines | JSON |
| /transparency/audit-anomalies/alerts.json | Threshold-based alert snapshots for automation and paging | JSON |
| /transparency/audit-anomalies/alerts.csv | Spreadsheet-friendly current alert posture and threshold context | CSV |
| /transparency/audit-anomalies/schema-registry.json | Versioned field definitions and deprecation timelines for anomaly JSON payloads | JSON |
| /developers/deprecations.json | Endpoint deprecation calendar with announce/sunset windows and migration pointers | JSON |
| /developers/deprecations.rss | RSS feed of API deprecation windows for subscriber-based reminder workflows | RSS |
| /developers/deprecations-playbook.md | Auto-generated migration playbook with per-endpoint operational checklists | Markdown |
| /developers/deprecations-playbook.json | Tooling-friendly migration playbook companion with structured checklist steps | JSON |
| /developers/api-errors.json | Error catalog with remediation notes and retry guidance by `error_code` | JSON |
| /status.json | Alias for latest system status payload (same shape as `/api/public/system-status`) | JSON |
| /api/public/system-status | Partner-facing uptime, monitor freshness, and cron automation status | JSON |
| /api/public/system-status/trend | Lightweight rolling history for uptime charts and alert trend baselines | JSON |
| /api/public/system-status/uptime | Windowed availability percentages and per-check degraded rates | JSON |
| /api/public/system-status/incidents | Resolved/active incident windows with duration and affected checks | JSON |
| /transparency/audit-anomalies/diff/export.json | Selected-vs-latest anomaly type analysis | JSON |
| /transparency/audit-anomalies/diff/compare.json | Arbitrary run-to-run reconciliation and drift checks | JSON |
| /transparency/audit-anomalies/diff/compare.csv | Spreadsheet workflows and manual audit packs | CSV |
Integration Checklist
- Poll runs.json every 5-15 minutes to detect newly available complete runs.
- Use summary.json for top-level status and run-to-latest deltas.
- Use trend.json for charting and threshold alerts (recommend alert when bad count increases run-over-run).
- Use alerts.json as the paging signal endpoint when your thresholds are crossed.
- Use schema-registry.json to pin field compatibility checks before parser updates.
- Use /api/public/system-status as a lightweight heartbeat for partner automation and uptime probes.
- Use /api/public/system-status/trend for simple uptime trend charts and incident postmortems.
- Use /api/public/system-status/uptime for SLO-style percentages over 24h/7d/30d windows.
- Use /api/public/system-status/incidents for machine-readable outage windows and postmortem timelines.
- Use diff/compare.json for machine checks and diff/compare.csv for manual reconciliation packets.
- Cache responses for at least 60 seconds; these are audit snapshots, not per-transaction streaming endpoints.