Scenarios live in separate repos and are cloned into
~/.local/share/replaybook/scenarios/ via
replaybook add.
Official pack: ducks/replaybook-scenarios
| ID | Title | Difficulty |
|---|---|---|
001-nginx-502 | 502 Bad Gateway | 1 |
002-postgres-wont-start | Postgres Won't Start | 1 |
003-missing-env-var | App Crashing on Boot | 1 |
004-disk-full | Health Checks Failing | 2 |
005-oom-kill | Container Keeps Restarting | 2 |
006-sidekiq-cant-connect | Jobs Not Processing | 2 |
007-packet-loss | Intermittent Request Failures | 3 |
Each scenario is a directory with:
my-scenario/
meta.json # id, title, page text, difficulty, hints, success condition
docker-compose.yml # the environment
break.sh # runs after compose up to inject the fault (or use break: [...] below)
check.sh # polled every 2s to detect resolution (or use http_200)
meta.json format:
{
"id": "my-scenario",
"title": "Something Is Broken",
"page": "alert text shown to the player",
"difficulty": 2,
"hints": [
"First hint revealed on first get-hint",
"Second hint revealed on second get-hint"
],
"success_condition": "http_200",
"success_target": "http://localhost:8080/health",
"shell_service": "app"
}
shell_service is the compose service the player is
dropped into. If unset, defaults to the first service defined in
docker-compose.yml. See any scenario in
ducks/replaybook-scenarios
for a working example.
Most faults are just "copy a file in" and/or "run a command in a
container." Instead of writing break.sh, add a
break array to meta.json:
"break": [
{ "cp": { "service": "nginx", "src": "nginx-broken.conf", "dest": "/etc/nginx/nginx.conf" } },
{ "exec": { "service": "nginx", "cmd": ["nginx", "-s", "reload"] } }
]
Steps run in order. Three kinds:
cp — copy src (a file in the scenario directory) to dest inside service's containerexec — run cmd inside service's containerrestart — restart service's container"break": [
{ "exec": { "service": "db", "cmd": ["chown", "-R", "root:root", "/var/lib/postgresql/data"] } },
{ "restart": { "service": "db" } }
]
If break is present, it's used instead of
break.sh. If a fault needs real script logic (loops,
conditionals, piping between commands), write break.sh
instead — it still works exactly as before.
replaybook add and replaybook run validate
each scenario before anything runs:
shell_service and any break step's service match a real compose servicebreak.sh or a break array is presentcheck.sh is present when success_condition is exit_zerosuccess_target is a valid http(s) URL when success_condition is http_200
replaybook add reports problems for every scenario in a
pack without blocking the rest of it. replaybook run
re-checks the single scenario it's about to launch and refuses to
start if it fails.