feat: replace PM2 with systemd --user services for production

Runs tssbot-web, tssbot-webhook, and tssbot-backend as systemd --user
units instead of PM2 processes. tssbot-web moves from a 2-worker PM2
cluster to a single instance, so deploys now restart it directly
instead of doing a zero-downtime cluster reload.

webhook.cjs now shells out to `systemctl --user restart` instead of
`pm2 reload`, and PM2_RESTART_TARGETS/WEBHOOK_PM2_NAME are renamed to
RESTART_TARGETS/WEBHOOK_SERVICE_NAME. scripts/install-systemd-services.sh
symlinks the new unit files into ~/.config/systemd/user and enables them.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-01 22:58:15 +00:00
parent 1fee214785
commit 341dae1913
10 changed files with 172 additions and 205 deletions
+53 -37
View File
@@ -6,7 +6,7 @@ The repo is split into:
- `frontend/` - React + Vite + Tailwind v4 web shell
- `backend/` - backend API service scaffold, ready for database-backed routes
- root process files - production frontend server, deploy webhook, PM2 config, and shared repo scripts
- root process files - production frontend server, deploy webhook, systemd unit files, and shared repo scripts
Routes:
@@ -42,7 +42,7 @@ The backend listens on <http://127.0.0.1:6000> by default and reads the SQLite
databases configured by `TSS_BATTLES_DB` and `TSS_TEAMS_DB`. Keep it bound to
`127.0.0.1` in production and let `tssbot-web` proxy public API requests.
## Production with PM2
## Production with systemd
On a fresh headless Ubuntu server, install the native build tools Rust crates
need before the first backend build:
@@ -58,13 +58,34 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
npm install
npm run build
npm run build:backend
pm2 start ecosystem.config.cjs
scripts/install-systemd-services.sh
```
The production server runs on <http://localhost:3010>. PM2 starts the web app in
cluster mode with two workers by default, waits for each worker to signal that it
is ready, and then reloads workers one at a time during deploys. Override the
worker count with `WEB_INSTANCES`.
`scripts/install-systemd-services.sh` symlinks the unit files under `systemd/`
into `~/.config/systemd/user/`, then runs `systemctl --user daemon-reload` and
`enable --now` for all three services. These are user-level (`systemctl --user`)
units running as the deploy user, not system-wide units — no root needed to
manage them day to day. Because user services normally stop when the user logs
out, enable lingering once so they keep running:
```sh
sudo loginctl enable-linger <deploy-user>
```
The production server runs on <http://localhost:3010>, `tssbot-backend` on
<http://127.0.0.1:6000> (see `BACKEND_PORT`), and the webhook listener on
`WEBHOOK_PORT`. Each runs as a single instance — a deploy restart is a plain
`systemctl --user restart`, so expect a brief (roughly 1-2 second) connection
drop while `tssbot-web` restarts, rather than PM2-style zero-downtime cluster
reloads.
Useful commands:
```sh
systemctl --user status tssbot-web tssbot-webhook tssbot-backend
journalctl --user -u tssbot-web -f
systemctl --user restart tssbot-web
```
The server serves `/health`
locally and only proxies the API routes used by the app:
@@ -94,18 +115,15 @@ ship `X-Content-Type-Options`, `X-Frame-Options: DENY`, `Referrer-Policy`,
HSTS (over HTTPS), and HTML responses include a Content Security Policy that
allows only Cloudflare Turnstile and the CARTO basemap tiles.
Override the API target before starting PM2 if needed:
Override the API target by setting it in `.env` and restarting:
```sh
API_UPSTREAM=http://127.0.0.1:8080 pm2 start ecosystem.config.cjs
echo 'API_UPSTREAM=http://127.0.0.1:8080' >> .env
systemctl --user restart tssbot-web
```
Set `PUBLIC_ORIGIN` to the public site origin in production, especially behind a
reverse proxy:
```sh
PUBLIC_ORIGIN=https://your-domain.example pm2 start ecosystem.config.cjs
```
reverse proxy (same `.env` + restart pattern as above).
Optional API protection tuning:
@@ -127,14 +145,12 @@ SITE_SESSION_TTL_SECONDS=43200
Successful Turnstile verification sets signed, HttpOnly Turnstile and site-session
cookies. `/api/*` and `/data/*` requests must present those cookies plus
same-origin browser request metadata, so the data is served to verified active
site sessions instead of as an open public API. All PM2 web instances must share
the same `SITE_SESSION_SECRET`.
site sessions instead of as an open public API.
On startup, the web server preloads the critical public snapshots before
signalling PM2 `ready`: team leaderboard, player leaderboard, home teams, and
recent games. `/health` includes a `public_data` block with the latest preload
status. A same-origin `POST /api/cache/prewarm` refreshes those snapshots on
demand.
On startup, the web server preloads the critical public snapshots: team
leaderboard, player leaderboard, home teams, and recent games. `/health`
includes a `public_data` block with the latest preload status. A same-origin
`POST /api/cache/prewarm` refreshes those snapshots on demand.
## Reverse proxy / Cloudflare
@@ -205,16 +221,15 @@ The webhook process listens on port `3011` at `/github`. Configure GitHub to sen
push events there.
A webhook secret is required — without `GITHUB_WEBHOOK_SECRET`, the webhook
rejects every request:
rejects every request. Put it in `.env` in the project root (recommended over
inlining the secret in a shell command, which writes it to shell history), then
restart:
```sh
GITHUB_WEBHOOK_SECRET=your-secret pm2 start ecosystem.config.cjs
echo 'GITHUB_WEBHOOK_SECRET=your-secret' >> .env
systemctl --user restart tssbot-webhook
```
On PowerShell, set `$env:GITHUB_WEBHOOK_SECRET = "your-secret"` before starting
PM2, or put the value in a `.env` file in the project root (recommended over
inlining the secret in a shell command, which writes it to shell history).
The webhook only deploys pushes whose `ref` is in `GITHUB_WEBHOOK_REFS`
(default `refs/heads/main`). Optionally pin the repository:
@@ -239,21 +254,22 @@ npm ci --include=dev --include=optional
npm run build -- --outDir ../dist-next
cargo build --manifest-path backend/Cargo.toml --release
# the webhook promotes dist-next to dist after carrying over old hashed assets
pm2 reload tssbot-web --update-env
pm2 reload tssbot-backend --update-env
systemctl --user restart tssbot-web tssbot-backend
```
Only processes listed in `PM2_RESTART_TARGETS` are reloaded. The default is
`tssbot-web,tssbot-backend`, so unrelated PM2 processes are left alone. The web server handles
`SIGINT` and `SIGTERM` by closing its listener and SQLite handles before exit,
which lets PM2 finish reloads without dropping active requests. The webhook
exits after 24 hours so PM2 restarts it cleanly.
Only services listed in `RESTART_TARGETS` are restarted. The default is
`tssbot-web,tssbot-backend`, so unrelated systemd units are left alone. The web
server handles `SIGINT` and `SIGTERM` by closing its listener and SQLite
handles before exit, giving systemd a clean shutdown within `TimeoutStopSec`
instead of a hard kill. The webhook exits cleanly every 24 hours; its unit uses
`Restart=always` so systemd relaunches it right away.
When webhook code changes are deployed, restart the webhook process once so PM2
loads the updated listener:
When webhook code changes are deployed, the webhook restarts itself once
(delayed so its own deploy response/notifications land first) so it loads the
updated listener:
```sh
pm2 reload tssbot-webhook --update-env
systemctl --user restart tssbot-webhook
```
The webhook listener reads `.env` on startup. To send Discord notifications for