feat: replace PM2 with systemd --user services for production
Runs tssbot-web, tssbot-webhook, and tssbot-backend as systemd --user units instead of PM2 processes. tssbot-web moves from a 2-worker PM2 cluster to a single instance, so deploys now restart it directly instead of doing a zero-downtime cluster reload. webhook.cjs now shells out to `systemctl --user restart` instead of `pm2 reload`, and PM2_RESTART_TARGETS/WEBHOOK_PM2_NAME are renamed to RESTART_TARGETS/WEBHOOK_SERVICE_NAME. scripts/install-systemd-services.sh symlinks the new unit files into ~/.config/systemd/user and enables them. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This commit is contained in:
@@ -6,7 +6,7 @@ The repo is split into:
|
||||
|
||||
- `frontend/` - React + Vite + Tailwind v4 web shell
|
||||
- `backend/` - backend API service scaffold, ready for database-backed routes
|
||||
- root process files - production frontend server, deploy webhook, PM2 config, and shared repo scripts
|
||||
- root process files - production frontend server, deploy webhook, systemd unit files, and shared repo scripts
|
||||
|
||||
Routes:
|
||||
|
||||
@@ -42,7 +42,7 @@ The backend listens on <http://127.0.0.1:6000> by default and reads the SQLite
|
||||
databases configured by `TSS_BATTLES_DB` and `TSS_TEAMS_DB`. Keep it bound to
|
||||
`127.0.0.1` in production and let `tssbot-web` proxy public API requests.
|
||||
|
||||
## Production with PM2
|
||||
## Production with systemd
|
||||
|
||||
On a fresh headless Ubuntu server, install the native build tools Rust crates
|
||||
need before the first backend build:
|
||||
@@ -58,13 +58,34 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||||
npm install
|
||||
npm run build
|
||||
npm run build:backend
|
||||
pm2 start ecosystem.config.cjs
|
||||
scripts/install-systemd-services.sh
|
||||
```
|
||||
|
||||
The production server runs on <http://localhost:3010>. PM2 starts the web app in
|
||||
cluster mode with two workers by default, waits for each worker to signal that it
|
||||
is ready, and then reloads workers one at a time during deploys. Override the
|
||||
worker count with `WEB_INSTANCES`.
|
||||
`scripts/install-systemd-services.sh` symlinks the unit files under `systemd/`
|
||||
into `~/.config/systemd/user/`, then runs `systemctl --user daemon-reload` and
|
||||
`enable --now` for all three services. These are user-level (`systemctl --user`)
|
||||
units running as the deploy user, not system-wide units — no root needed to
|
||||
manage them day to day. Because user services normally stop when the user logs
|
||||
out, enable lingering once so they keep running:
|
||||
|
||||
```sh
|
||||
sudo loginctl enable-linger <deploy-user>
|
||||
```
|
||||
|
||||
The production server runs on <http://localhost:3010>, `tssbot-backend` on
|
||||
<http://127.0.0.1:6000> (see `BACKEND_PORT`), and the webhook listener on
|
||||
`WEBHOOK_PORT`. Each runs as a single instance — a deploy restart is a plain
|
||||
`systemctl --user restart`, so expect a brief (roughly 1-2 second) connection
|
||||
drop while `tssbot-web` restarts, rather than PM2-style zero-downtime cluster
|
||||
reloads.
|
||||
|
||||
Useful commands:
|
||||
|
||||
```sh
|
||||
systemctl --user status tssbot-web tssbot-webhook tssbot-backend
|
||||
journalctl --user -u tssbot-web -f
|
||||
systemctl --user restart tssbot-web
|
||||
```
|
||||
|
||||
The server serves `/health`
|
||||
locally and only proxies the API routes used by the app:
|
||||
@@ -94,18 +115,15 @@ ship `X-Content-Type-Options`, `X-Frame-Options: DENY`, `Referrer-Policy`,
|
||||
HSTS (over HTTPS), and HTML responses include a Content Security Policy that
|
||||
allows only Cloudflare Turnstile and the CARTO basemap tiles.
|
||||
|
||||
Override the API target before starting PM2 if needed:
|
||||
Override the API target by setting it in `.env` and restarting:
|
||||
|
||||
```sh
|
||||
API_UPSTREAM=http://127.0.0.1:8080 pm2 start ecosystem.config.cjs
|
||||
echo 'API_UPSTREAM=http://127.0.0.1:8080' >> .env
|
||||
systemctl --user restart tssbot-web
|
||||
```
|
||||
|
||||
Set `PUBLIC_ORIGIN` to the public site origin in production, especially behind a
|
||||
reverse proxy:
|
||||
|
||||
```sh
|
||||
PUBLIC_ORIGIN=https://your-domain.example pm2 start ecosystem.config.cjs
|
||||
```
|
||||
reverse proxy (same `.env` + restart pattern as above).
|
||||
|
||||
Optional API protection tuning:
|
||||
|
||||
@@ -127,14 +145,12 @@ SITE_SESSION_TTL_SECONDS=43200
|
||||
Successful Turnstile verification sets signed, HttpOnly Turnstile and site-session
|
||||
cookies. `/api/*` and `/data/*` requests must present those cookies plus
|
||||
same-origin browser request metadata, so the data is served to verified active
|
||||
site sessions instead of as an open public API. All PM2 web instances must share
|
||||
the same `SITE_SESSION_SECRET`.
|
||||
site sessions instead of as an open public API.
|
||||
|
||||
On startup, the web server preloads the critical public snapshots before
|
||||
signalling PM2 `ready`: team leaderboard, player leaderboard, home teams, and
|
||||
recent games. `/health` includes a `public_data` block with the latest preload
|
||||
status. A same-origin `POST /api/cache/prewarm` refreshes those snapshots on
|
||||
demand.
|
||||
On startup, the web server preloads the critical public snapshots: team
|
||||
leaderboard, player leaderboard, home teams, and recent games. `/health`
|
||||
includes a `public_data` block with the latest preload status. A same-origin
|
||||
`POST /api/cache/prewarm` refreshes those snapshots on demand.
|
||||
|
||||
## Reverse proxy / Cloudflare
|
||||
|
||||
@@ -205,16 +221,15 @@ The webhook process listens on port `3011` at `/github`. Configure GitHub to sen
|
||||
push events there.
|
||||
|
||||
A webhook secret is required — without `GITHUB_WEBHOOK_SECRET`, the webhook
|
||||
rejects every request:
|
||||
rejects every request. Put it in `.env` in the project root (recommended over
|
||||
inlining the secret in a shell command, which writes it to shell history), then
|
||||
restart:
|
||||
|
||||
```sh
|
||||
GITHUB_WEBHOOK_SECRET=your-secret pm2 start ecosystem.config.cjs
|
||||
echo 'GITHUB_WEBHOOK_SECRET=your-secret' >> .env
|
||||
systemctl --user restart tssbot-webhook
|
||||
```
|
||||
|
||||
On PowerShell, set `$env:GITHUB_WEBHOOK_SECRET = "your-secret"` before starting
|
||||
PM2, or put the value in a `.env` file in the project root (recommended over
|
||||
inlining the secret in a shell command, which writes it to shell history).
|
||||
|
||||
The webhook only deploys pushes whose `ref` is in `GITHUB_WEBHOOK_REFS`
|
||||
(default `refs/heads/main`). Optionally pin the repository:
|
||||
|
||||
@@ -239,21 +254,22 @@ npm ci --include=dev --include=optional
|
||||
npm run build -- --outDir ../dist-next
|
||||
cargo build --manifest-path backend/Cargo.toml --release
|
||||
# the webhook promotes dist-next to dist after carrying over old hashed assets
|
||||
pm2 reload tssbot-web --update-env
|
||||
pm2 reload tssbot-backend --update-env
|
||||
systemctl --user restart tssbot-web tssbot-backend
|
||||
```
|
||||
|
||||
Only processes listed in `PM2_RESTART_TARGETS` are reloaded. The default is
|
||||
`tssbot-web,tssbot-backend`, so unrelated PM2 processes are left alone. The web server handles
|
||||
`SIGINT` and `SIGTERM` by closing its listener and SQLite handles before exit,
|
||||
which lets PM2 finish reloads without dropping active requests. The webhook
|
||||
exits after 24 hours so PM2 restarts it cleanly.
|
||||
Only services listed in `RESTART_TARGETS` are restarted. The default is
|
||||
`tssbot-web,tssbot-backend`, so unrelated systemd units are left alone. The web
|
||||
server handles `SIGINT` and `SIGTERM` by closing its listener and SQLite
|
||||
handles before exit, giving systemd a clean shutdown within `TimeoutStopSec`
|
||||
instead of a hard kill. The webhook exits cleanly every 24 hours; its unit uses
|
||||
`Restart=always` so systemd relaunches it right away.
|
||||
|
||||
When webhook code changes are deployed, restart the webhook process once so PM2
|
||||
loads the updated listener:
|
||||
When webhook code changes are deployed, the webhook restarts itself once
|
||||
(delayed so its own deploy response/notifications land first) so it loads the
|
||||
updated listener:
|
||||
|
||||
```sh
|
||||
pm2 reload tssbot-webhook --update-env
|
||||
systemctl --user restart tssbot-webhook
|
||||
```
|
||||
|
||||
The webhook listener reads `.env` on startup. To send Discord notifications for
|
||||
|
||||
Reference in New Issue
Block a user