Chapter 15: Risk Management & Incident Response
An exchange is a high-value target and sits under multiple regulatory and market pressures. Incidents are not “if” but “when.” This chapter helps you prepare playbooks in advance.
15.1 Risk map
flowchart TD
Root[Exchange risk] --> Sec[Security
theft/key leak/insider] Root --> Ops[Operational
outage/matching fault/data loss] Root --> Liq[Liquidity/market
bank run/inventory loss/depeg] Root --> Reg[Compliance/regulatory
penalty/deregistration/probe] Root --> Bank[Counterparty
bank freeze/custodian/MM default] Root --> Legal[Legal/reputational
litigation/negative press]
theft/key leak/insider] Root --> Ops[Operational
outage/matching fault/data loss] Root --> Liq[Liquidity/market
bank run/inventory loss/depeg] Root --> Reg[Compliance/regulatory
penalty/deregistration/probe] Root --> Bank[Counterparty
bank freeze/custodian/MM default] Root --> Legal[Legal/reputational
litigation/negative press]
15.2 Risk register
For each risk class, keep a register with at least: description → likelihood → impact → existing controls → residual risk → owner.
| Risk | Key mitigation |
|---|---|
| Hot-wallet theft | Hot/cold split, hot limits, multisig/MPC, real-time monitoring, insurance |
| Key leak / insider | Split custody, least privilege, operation audit, background checks |
| Matching/system outage | HA architecture, BCP/DR drills, explicit RTO/RPO |
| Bank run | Ample reserves, Proof of Reserves, liquidity management, withdrawal risk control |
| Bank account frozen | Multiple banking relationships, explainable compliance, client-fund segregation |
| MM/custodian default | Diversify counterparties, contractual protection, exposure limits |
| Regulatory penalty/probe | Strong compliance, proactive communication, complete records |
15.3 Incident Response Plan (IRP)
For theft, outage, data breach, pre-define the response — don’t improvise during a crisis.
flowchart LR
D[Detect] --> C[Contain]
C --> E[Eradicate]
E --> R[Recover]
R --> P[Post-mortem]
C -.parallel.-> Comm[Notify: regulator/users/parties]
| Stage | Key actions |
|---|---|
| Detect | Monitoring alerts, user reports, anomaly detection |
| Contain | Pause affected functions/withdrawals, isolate systems, freeze suspicious accounts |
| Eradicate | Find root cause, patch, rotate keys |
| Recover | Restore services after verifying safety, reconcile assets |
| Notify | Report to SC within required timeframe; notify affected users per law/contract; involve police if needed |
| Post-mortem | Blameless review, produce and implement improvements |
🚨 Critical: major incidents have regulatory reporting deadlines. The plan must state “who, within how long, via what channel” reports to the SC / relevant authorities.
15.4 Crisis communications
- Pre-prepare templates for internal, user, regulator, media audiences.
- Designate a single external spokesperson to avoid mixed messages.
- Transparent but careful: no concealment, no exaggeration, no exploitable technical detail.
- Publish the post-mortem and remediation afterward to rebuild trust.
15.5 Business continuity & worst case
- BCP/DR: drill failover and data recovery regularly (echoes Ch.6).
- Cash buffer: keep enough operating cash to survive a revenue slump (echoes Ch.9).
- Orderly wind-down plan: if you must cease operations, how to protect client assets, return them in order, and notify the regulator — also an SC concern; design it in advance.
Summary / action items
- Build a risk register across the six risk classes, updated regularly
- Write the incident-response plan (incl. regulator reporting deadline and channel)
- Prepare crisis-comms templates and a single spokesperson
- Drill BCP/DR and incident response regularly (tabletop + live)
- Design an orderly wind-down / client-asset-return plan
- Maintain cyber/crime insurance and review coverage regularly
➡️ Next: FAQ & Glossary