Compare commits

1 Commits

Author SHA1 Message Date
Liam 341dae1913 feat: replace PM2 with systemd --user services for production
Runs tssbot-web, tssbot-webhook, and tssbot-backend as systemd --user
units instead of PM2 processes. tssbot-web moves from a 2-worker PM2
cluster to a single instance, so deploys now restart it directly
instead of doing a zero-downtime cluster reload.

webhook.cjs now shells out to `systemctl --user restart` instead of
`pm2 reload`, and PM2_RESTART_TARGETS/WEBHOOK_PM2_NAME are renamed to
RESTART_TARGETS/WEBHOOK_SERVICE_NAME. scripts/install-systemd-services.sh
symlinks the new unit files into ~/.config/systemd/user and enables them.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 22:58:15 +00:00
10 changed files with 172 additions and 205 deletions
+53 -37
View File
@@ -6,7 +6,7 @@ The repo is split into:
- `frontend/` - React + Vite + Tailwind v4 web shell - `frontend/` - React + Vite + Tailwind v4 web shell
- `backend/` - backend API service scaffold, ready for database-backed routes - `backend/` - backend API service scaffold, ready for database-backed routes
- root process files - production frontend server, deploy webhook, PM2 config, and shared repo scripts - root process files - production frontend server, deploy webhook, systemd unit files, and shared repo scripts
Routes: Routes:
@@ -42,7 +42,7 @@ The backend listens on <http://127.0.0.1:6000> by default and reads the SQLite
databases configured by `TSS_BATTLES_DB` and `TSS_TEAMS_DB`. Keep it bound to databases configured by `TSS_BATTLES_DB` and `TSS_TEAMS_DB`. Keep it bound to
`127.0.0.1` in production and let `tssbot-web` proxy public API requests. `127.0.0.1` in production and let `tssbot-web` proxy public API requests.
## Production with PM2 ## Production with systemd
On a fresh headless Ubuntu server, install the native build tools Rust crates On a fresh headless Ubuntu server, install the native build tools Rust crates
need before the first backend build: need before the first backend build:
@@ -58,13 +58,34 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
npm install npm install
npm run build npm run build
npm run build:backend npm run build:backend
pm2 start ecosystem.config.cjs scripts/install-systemd-services.sh
``` ```
The production server runs on <http://localhost:3010>. PM2 starts the web app in `scripts/install-systemd-services.sh` symlinks the unit files under `systemd/`
cluster mode with two workers by default, waits for each worker to signal that it into `~/.config/systemd/user/`, then runs `systemctl --user daemon-reload` and
is ready, and then reloads workers one at a time during deploys. Override the `enable --now` for all three services. These are user-level (`systemctl --user`)
worker count with `WEB_INSTANCES`. units running as the deploy user, not system-wide units — no root needed to
manage them day to day. Because user services normally stop when the user logs
out, enable lingering once so they keep running:
```sh
sudo loginctl enable-linger <deploy-user>
```
The production server runs on <http://localhost:3010>, `tssbot-backend` on
<http://127.0.0.1:6000> (see `BACKEND_PORT`), and the webhook listener on
`WEBHOOK_PORT`. Each runs as a single instance — a deploy restart is a plain
`systemctl --user restart`, so expect a brief (roughly 1-2 second) connection
drop while `tssbot-web` restarts, rather than PM2-style zero-downtime cluster
reloads.
Useful commands:
```sh
systemctl --user status tssbot-web tssbot-webhook tssbot-backend
journalctl --user -u tssbot-web -f
systemctl --user restart tssbot-web
```
The server serves `/health` The server serves `/health`
locally and only proxies the API routes used by the app: locally and only proxies the API routes used by the app:
@@ -94,18 +115,15 @@ ship `X-Content-Type-Options`, `X-Frame-Options: DENY`, `Referrer-Policy`,
HSTS (over HTTPS), and HTML responses include a Content Security Policy that HSTS (over HTTPS), and HTML responses include a Content Security Policy that
allows only Cloudflare Turnstile and the CARTO basemap tiles. allows only Cloudflare Turnstile and the CARTO basemap tiles.
Override the API target before starting PM2 if needed: Override the API target by setting it in `.env` and restarting:
```sh ```sh
API_UPSTREAM=http://127.0.0.1:8080 pm2 start ecosystem.config.cjs echo 'API_UPSTREAM=http://127.0.0.1:8080' >> .env
systemctl --user restart tssbot-web
``` ```
Set `PUBLIC_ORIGIN` to the public site origin in production, especially behind a Set `PUBLIC_ORIGIN` to the public site origin in production, especially behind a
reverse proxy: reverse proxy (same `.env` + restart pattern as above).
```sh
PUBLIC_ORIGIN=https://your-domain.example pm2 start ecosystem.config.cjs
```
Optional API protection tuning: Optional API protection tuning:
@@ -127,14 +145,12 @@ SITE_SESSION_TTL_SECONDS=43200
Successful Turnstile verification sets signed, HttpOnly Turnstile and site-session Successful Turnstile verification sets signed, HttpOnly Turnstile and site-session
cookies. `/api/*` and `/data/*` requests must present those cookies plus cookies. `/api/*` and `/data/*` requests must present those cookies plus
same-origin browser request metadata, so the data is served to verified active same-origin browser request metadata, so the data is served to verified active
site sessions instead of as an open public API. All PM2 web instances must share site sessions instead of as an open public API.
the same `SITE_SESSION_SECRET`.
On startup, the web server preloads the critical public snapshots before On startup, the web server preloads the critical public snapshots: team
signalling PM2 `ready`: team leaderboard, player leaderboard, home teams, and leaderboard, player leaderboard, home teams, and recent games. `/health`
recent games. `/health` includes a `public_data` block with the latest preload includes a `public_data` block with the latest preload status. A same-origin
status. A same-origin `POST /api/cache/prewarm` refreshes those snapshots on `POST /api/cache/prewarm` refreshes those snapshots on demand.
demand.
## Reverse proxy / Cloudflare ## Reverse proxy / Cloudflare
@@ -205,16 +221,15 @@ The webhook process listens on port `3011` at `/github`. Configure GitHub to sen
push events there. push events there.
A webhook secret is required — without `GITHUB_WEBHOOK_SECRET`, the webhook A webhook secret is required — without `GITHUB_WEBHOOK_SECRET`, the webhook
rejects every request: rejects every request. Put it in `.env` in the project root (recommended over
inlining the secret in a shell command, which writes it to shell history), then
restart:
```sh ```sh
GITHUB_WEBHOOK_SECRET=your-secret pm2 start ecosystem.config.cjs echo 'GITHUB_WEBHOOK_SECRET=your-secret' >> .env
systemctl --user restart tssbot-webhook
``` ```
On PowerShell, set `$env:GITHUB_WEBHOOK_SECRET = "your-secret"` before starting
PM2, or put the value in a `.env` file in the project root (recommended over
inlining the secret in a shell command, which writes it to shell history).
The webhook only deploys pushes whose `ref` is in `GITHUB_WEBHOOK_REFS` The webhook only deploys pushes whose `ref` is in `GITHUB_WEBHOOK_REFS`
(default `refs/heads/main`). Optionally pin the repository: (default `refs/heads/main`). Optionally pin the repository:
@@ -239,21 +254,22 @@ npm ci --include=dev --include=optional
npm run build -- --outDir ../dist-next npm run build -- --outDir ../dist-next
cargo build --manifest-path backend/Cargo.toml --release cargo build --manifest-path backend/Cargo.toml --release
# the webhook promotes dist-next to dist after carrying over old hashed assets # the webhook promotes dist-next to dist after carrying over old hashed assets
pm2 reload tssbot-web --update-env systemctl --user restart tssbot-web tssbot-backend
pm2 reload tssbot-backend --update-env
``` ```
Only processes listed in `PM2_RESTART_TARGETS` are reloaded. The default is Only services listed in `RESTART_TARGETS` are restarted. The default is
`tssbot-web,tssbot-backend`, so unrelated PM2 processes are left alone. The web server handles `tssbot-web,tssbot-backend`, so unrelated systemd units are left alone. The web
`SIGINT` and `SIGTERM` by closing its listener and SQLite handles before exit, server handles `SIGINT` and `SIGTERM` by closing its listener and SQLite
which lets PM2 finish reloads without dropping active requests. The webhook handles before exit, giving systemd a clean shutdown within `TimeoutStopSec`
exits after 24 hours so PM2 restarts it cleanly. instead of a hard kill. The webhook exits cleanly every 24 hours; its unit uses
`Restart=always` so systemd relaunches it right away.
When webhook code changes are deployed, restart the webhook process once so PM2 When webhook code changes are deployed, the webhook restarts itself once
loads the updated listener: (delayed so its own deploy response/notifications land first) so it loads the
updated listener:
```sh ```sh
pm2 reload tssbot-webhook --update-env systemctl --user restart tssbot-webhook
``` ```
The webhook listener reads `.env` on startup. To send Discord notifications for The webhook listener reads `.env` on startup. To send Discord notifications for
+1 -1
View File
@@ -13,7 +13,7 @@ before falling back to the current working directory.
- `BACKEND_HOST` bind host, default `127.0.0.1` - `BACKEND_HOST` bind host, default `127.0.0.1`
- `BACKEND_ALLOWED_ORIGINS` comma-separated browser origins allowed by CORS - `BACKEND_ALLOWED_ORIGINS` comma-separated browser origins allowed by CORS
Both paths can be absolute or relative to the repo root when run through the root scripts/PM2. Both paths can be absolute or relative to the repo root when run through the root scripts/systemd units.
## Vehicle translation + icons ## Vehicle translation + icons
-114
View File
@@ -1,114 +0,0 @@
const fs = require('node:fs')
const path = require('node:path')
function loadEnvFile() {
const envPath = path.join(__dirname, '.env')
if (!fs.existsSync(envPath)) return
const lines = fs.readFileSync(envPath, 'utf8').split(/\r?\n/)
for (const line of lines) {
const trimmed = line.trim()
if (!trimmed || trimmed.startsWith('#')) continue
const separatorIndex = trimmed.indexOf('=')
if (separatorIndex === -1) continue
const key = trimmed.slice(0, separatorIndex).trim()
let value = trimmed.slice(separatorIndex + 1).trim()
if (
(value.startsWith('"') && value.endsWith('"')) ||
(value.startsWith("'") && value.endsWith("'"))
) {
value = value.slice(1, -1)
}
if (key && (!process.env[key] || process.env[key] === '')) {
process.env[key] = value
}
}
}
loadEnvFile()
// Crash-loop governor: after max_restarts attempts that each fail to stay up
// min_uptime ms, PM2 marks the app `errored` and stops relaunching it, instead
// of restarting forever and pegging the CPU.
const RESTART_POLICY = {
max_restarts: 10,
min_uptime: 10000,
exp_backoff_restart_delay: 200,
}
module.exports = {
apps: [
{
name: 'tssbot-web',
...RESTART_POLICY,
script: 'server.cjs',
cwd: __dirname,
exec_mode: 'cluster',
instances: process.env.WEB_INSTANCES || 2,
wait_ready: true,
listen_timeout: 10000,
kill_timeout: 10000,
env: {
NODE_ENV: 'production',
PORT: process.env.PORT || 3010,
API_UPSTREAM: process.env.API_UPSTREAM || 'http://127.0.0.1:6000',
PUBLIC_ORIGIN: process.env.PUBLIC_ORIGIN || '',
UPTIME_STORAGE_DIR: process.env.UPTIME_STORAGE_DIR || '~/tsswebstorage',
UPTIME_DATABASE_FILE: process.env.UPTIME_DATABASE_FILE || 'uptime.sqlite',
UPTIME_SAMPLE_INTERVAL_MS: process.env.UPTIME_SAMPLE_INTERVAL_MS || 1800000,
UPTIME_HISTORY_LIMIT: process.env.UPTIME_HISTORY_LIMIT || 336,
API_CACHE_TTL_MS: process.env.API_CACHE_TTL_MS || 15000,
API_RATE_LIMIT_WINDOW_MS: process.env.API_RATE_LIMIT_WINDOW_MS || 60000,
API_RATE_LIMIT_MAX: process.env.API_RATE_LIMIT_MAX || 120,
TRUST_PROXY: process.env.TRUST_PROXY || 'cloudflare',
TRUSTED_UPSTREAM_IPS: process.env.TRUSTED_UPSTREAM_IPS || '127.0.0.1,::1,::ffff:127.0.0.1',
SITE_SESSION_SECRET: process.env.SITE_SESSION_SECRET || process.env.API_SESSION_SECRET || process.env.TURNSTILE_SECRET_KEY || '',
SITE_SESSION_TTL_SECONDS: process.env.SITE_SESSION_TTL_SECONDS || 43200,
TURNSTILE_SECRET_KEY: process.env.TURNSTILE_SECRET_KEY || '',
},
},
{
name: 'tssbot-webhook',
...RESTART_POLICY,
script: 'webhook.cjs',
cwd: __dirname,
autorestart: true,
env: {
NODE_ENV: 'production',
WEBHOOK_PORT: process.env.WEBHOOK_PORT || 3011,
WEBHOOK_PM2_NAME: process.env.WEBHOOK_PM2_NAME || 'tssbot-webhook',
GITHUB_WEBHOOK_SECRET: process.env.GITHUB_WEBHOOK_SECRET || '',
GITHUB_WEBHOOK_REFS: process.env.GITHUB_WEBHOOK_REFS || 'refs/heads/main',
GITHUB_WEBHOOK_REPOSITORY: process.env.GITHUB_WEBHOOK_REPOSITORY || '',
PM2_RESTART_TARGETS: process.env.PM2_RESTART_TARGETS || 'tssbot-web,tssbot-backend',
DISCORD_WEBHOOK_URL: process.env.DISCORD_WEBHOOK_URL || '',
DISCORD_INCLUDE_PATCH: process.env.DISCORD_INCLUDE_PATCH || 'false',
},
},
{
name: 'tssbot-backend',
...RESTART_POLICY,
script: process.platform === 'win32'
? 'backend/target/release/tssbot-backend.exe'
: 'backend/target/release/tssbot-backend',
cwd: __dirname,
autorestart: true,
env: {
NODE_ENV: 'production',
BACKEND_PORT: process.env.BACKEND_PORT || 6000,
BACKEND_HOST: process.env.BACKEND_HOST || '127.0.0.1',
BACKEND_ALLOWED_ORIGINS: process.env.BACKEND_ALLOWED_ORIGINS || process.env.PUBLIC_ORIGIN || '',
TSS_BATTLES_DB: process.env.TSS_BATTLES_DB || 'tss_battles.db',
TSS_TEAMS_DB: process.env.TSS_TEAMS_DB || 'tss_teams.db',
// Vehicle name + icon caches (built by the bots in the shared STORAGE volume).
VEHICLE_TRANSLATIONS_JSON: process.env.VEHICLE_TRANSLATIONS_JSON
|| '/mnt/HC_Volume_105581488/STORAGE/CACHE/vehicle_translations.json',
VEHICLE_DATA_CACHE_JSON: process.env.VEHICLE_DATA_CACHE_JSON
|| '/mnt/HC_Volume_105581488/STORAGE/CACHE/vehicle_data_cache.json',
},
},
],
}
+3 -1
View File
@@ -55,7 +55,9 @@ GITHUB_WEBHOOK_SECRET=change-me
GITHUB_WEBHOOK_REFS=refs/heads/main GITHUB_WEBHOOK_REFS=refs/heads/main
# Optional: refuse pushes whose repository.full_name does not match (e.g. "owner/repo"). # Optional: refuse pushes whose repository.full_name does not match (e.g. "owner/repo").
GITHUB_WEBHOOK_REPOSITORY= GITHUB_WEBHOOK_REPOSITORY=
PM2_RESTART_TARGETS=tssbot-web,tssbot-backend # Comma-separated systemd --user unit names (without .service) restarted after a
# successful deploy build.
RESTART_TARGETS=tssbot-web,tssbot-backend
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/... DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...
# Set to "true" only if the Discord channel is private. Default omits the patch preview # Set to "true" only if the Discord channel is private. Default omits the patch preview
DISCORD_INCLUDE_PATCH=true DISCORD_INCLUDE_PATCH=true
+22
View File
@@ -0,0 +1,22 @@
#!/usr/bin/env bash
# Installs/updates the tssbot-web systemd --user units and (re)starts them.
# Run this after cloning, and again any time a unit file under systemd/ changes.
set -euo pipefail
repo_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
unit_dir="${HOME}/.config/systemd/user"
mkdir -p "${unit_dir}"
for unit in tssbot-web tssbot-webhook tssbot-backend; do
ln -sf "${repo_dir}/systemd/${unit}.service" "${unit_dir}/${unit}.service"
done
systemctl --user daemon-reload
systemctl --user enable --now tssbot-web.service tssbot-webhook.service tssbot-backend.service
if [ "$(loginctl show-user "$(whoami)" -p Linger --value 2>/dev/null)" != "yes" ]; then
echo "Linger is not enabled for $(whoami) — user services will stop when you log out."
echo "Enable it with: sudo loginctl enable-linger $(whoami)"
fi
systemctl --user status --no-pager tssbot-web.service tssbot-webhook.service tssbot-backend.service
+4 -4
View File
@@ -3510,10 +3510,10 @@ function shutdown() {
// server.close() only fires its callback once every socket is gone, and idle // server.close() only fires its callback once every socket is gone, and idle
// HTTP keep-alive sockets (held open by nginx/Cloudflare) never close on // HTTP keep-alive sockets (held open by nginx/Cloudflare) never close on
// their own — so without this the worker hangs the full kill_timeout on every // their own — so without this the process hangs the full TimeoutStopSec on
// stop/reload, which is what wedges the PM2 cluster daemon. Close idle sockets // every stop/restart. Close idle sockets immediately, let in-flight requests
// immediately, let in-flight requests finish for a short grace period, then // finish for a short grace period, then force the rest so shutdown completes
// force the rest so shutdown completes well inside kill_timeout. // well inside TimeoutStopSec.
server.closeIdleConnections() server.closeIdleConnections()
setTimeout(() => server.closeAllConnections(), 3000).unref() setTimeout(() => server.closeAllConnections(), 3000).unref()
+19
View File
@@ -0,0 +1,19 @@
[Unit]
Description=tssbot backend API service
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=100
StartLimitBurst=10
[Service]
Type=simple
WorkingDirectory=%h/tssbot.web
ExecStart=%h/tssbot.web/backend/target/release/tssbot-backend
Restart=on-failure
RestartSec=200ms
RestartSteps=10
RestartMaxDelaySec=10s
TimeoutStopSec=10
[Install]
WantedBy=default.target
+19
View File
@@ -0,0 +1,19 @@
[Unit]
Description=tssbot-web production server
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=100
StartLimitBurst=10
[Service]
Type=simple
WorkingDirectory=%h/tssbot.web
ExecStart=/usr/bin/node server.cjs
Restart=on-failure
RestartSec=200ms
RestartSteps=10
RestartMaxDelaySec=10s
TimeoutStopSec=10
[Install]
WantedBy=default.target
+21
View File
@@ -0,0 +1,21 @@
[Unit]
Description=tssbot-web GitHub deploy webhook
After=network-online.target
Wants=network-online.target
StartLimitIntervalSec=100
StartLimitBurst=10
[Service]
Type=simple
WorkingDirectory=%h/tssbot.web
ExecStart=/usr/bin/node webhook.cjs
# Always (not on-failure): the process deliberately exits 0 every 24h so
# systemd relaunches it with a clean listener.
Restart=always
RestartSec=200ms
RestartSteps=10
RestartMaxDelaySec=10s
TimeoutStopSec=10
[Install]
WantedBy=default.target
+26 -44
View File
@@ -40,14 +40,14 @@ const PORT = Number(process.env.WEBHOOK_PORT || 3011)
const SECRET = process.env.GITHUB_WEBHOOK_SECRET || '' const SECRET = process.env.GITHUB_WEBHOOK_SECRET || ''
const DISCORD_WEBHOOK_URL = process.env.DISCORD_WEBHOOK_URL || '' const DISCORD_WEBHOOK_URL = process.env.DISCORD_WEBHOOK_URL || ''
const DISCORD_INCLUDE_PATCH = /^(1|true|yes)$/i.test(String(process.env.DISCORD_INCLUDE_PATCH || '')) const DISCORD_INCLUDE_PATCH = /^(1|true|yes)$/i.test(String(process.env.DISCORD_INCLUDE_PATCH || ''))
const RESTART_TARGETS = (process.env.PM2_RESTART_TARGETS || 'tssbot-web,tssbot-backend') const RESTART_TARGETS = (process.env.RESTART_TARGETS || 'tssbot-web,tssbot-backend')
.split(',') .split(',')
.map((target) => target.trim()) .map((target) => target.trim())
.filter((target) => /^[A-Za-z0-9_.:-]{1,80}$/.test(target)) .filter((target) => /^[A-Za-z0-9_.:-]{1,80}$/.test(target))
.filter(Boolean) .filter(Boolean)
// This webhook's own PM2 process name — never reload it during its own deploy. // This webhook's own systemd unit name — never restart it inline during its own deploy.
const SELF_PM2_NAME = process.env.WEBHOOK_PM2_NAME || 'tssbot-webhook' const SELF_SERVICE_NAME = process.env.WEBHOOK_SERVICE_NAME || 'tssbot-webhook'
const DIST_DIR = path.join(__dirname, 'dist') const DIST_DIR = path.join(__dirname, 'dist')
const NEXT_DIST_DIR = path.join(__dirname, 'dist-next') const NEXT_DIST_DIR = path.join(__dirname, 'dist-next')
const PREVIOUS_DIST_DIR = path.join(__dirname, 'dist-previous') const PREVIOUS_DIST_DIR = path.join(__dirname, 'dist-previous')
@@ -57,7 +57,7 @@ const WEBHOOK_HEADERS_TIMEOUT_MS = Number(process.env.WEBHOOK_HEADERS_TIMEOUT_MS
// No deploy step may hang forever. A stalled `npm ci` (a native postinstall that // No deploy step may hang forever. A stalled `npm ci` (a native postinstall that
// never returns) would otherwise block for hours with node_modules already // never returns) would otherwise block for hours with node_modules already
// deleted — which is exactly what took the site down. These cap each step so a // deleted — which is exactly what took the site down. These cap each step so a
// hang fails fast and aborts the deploy before any pm2 reload. // hang fails fast and aborts the deploy before any systemctl restart.
const DEPLOY_STEP_TIMEOUT_MS = Number(process.env.DEPLOY_STEP_TIMEOUT_MS || 15 * 60 * 1000) const DEPLOY_STEP_TIMEOUT_MS = Number(process.env.DEPLOY_STEP_TIMEOUT_MS || 15 * 60 * 1000)
const DEPLOY_INSTALL_TIMEOUT_MS = Number(process.env.DEPLOY_INSTALL_TIMEOUT_MS || 8 * 60 * 1000) const DEPLOY_INSTALL_TIMEOUT_MS = Number(process.env.DEPLOY_INSTALL_TIMEOUT_MS || 8 * 60 * 1000)
const ALLOWED_REFS = new Set( const ALLOWED_REFS = new Set(
@@ -200,7 +200,6 @@ function commandFor(command) {
} }
if (process.platform !== 'win32') return command if (process.platform !== 'win32') return command
if (command === 'npm') return 'npm.cmd' if (command === 'npm') return 'npm.cmd'
if (command === 'pm2') return 'pm2.cmd'
return command return command
} }
@@ -218,7 +217,7 @@ function restartTargetsInclude(target) {
} }
function pushTouchesWebhookRuntime(push) { function pushTouchesWebhookRuntime(push) {
const runtimeFiles = new Set(['webhook.cjs', 'ecosystem.config.cjs']) const runtimeFiles = new Set(['webhook.cjs', 'systemd/tssbot-webhook.service'])
const commits = Array.isArray(push?.commits) ? push.commits : [] const commits = Array.isArray(push?.commits) ? push.commits : []
return commits.some((commit) => { return commits.some((commit) => {
const changed = [ const changed = [
@@ -230,28 +229,18 @@ function pushTouchesWebhookRuntime(push) {
}) })
} }
function scheduleSelfReload(reason) { function scheduleSelfRestart(reason) {
let resolvedCommand console.log(`scheduling ${SELF_SERVICE_NAME} restart: ${reason}`)
try { // Delayed + detached: `systemctl --user restart` sends SIGTERM to this very
resolvedCommand = commandFor('pm2') // process once it starts, so fire it after this tick unrefs and let the
} catch (error) { // deploy's response/notifications land first.
console.error(`could not schedule ${SELF_PM2_NAME} reload:`, error.message)
return
}
console.log(`scheduling ${SELF_PM2_NAME} reload: ${reason}`)
setTimeout(() => { setTimeout(() => {
const child = spawn( const child = spawn('systemctl', ['--user', 'restart', `${SELF_SERVICE_NAME}.service`], {
resolvedCommand,
['reload', 'ecosystem.config.cjs', '--only', SELF_PM2_NAME, '--update-env'],
{
cwd: __dirname, cwd: __dirname,
env: process.env, env: process.env,
detached: true, detached: true,
stdio: 'ignore', stdio: 'ignore',
shell: process.platform === 'win32', })
},
)
child.unref() child.unref()
}, 1000).unref() }, 1000).unref()
} }
@@ -275,7 +264,7 @@ function run(command, args, options = {}) {
stdio: 'inherit', stdio: 'inherit',
}) })
// Kill the step if it hangs so deploy() aborts before any pm2 reload instead // Kill the step if it hangs so deploy() aborts before any systemctl restart instead
// of wedging here indefinitely (see DEPLOY_STEP_TIMEOUT_MS above). // of wedging here indefinitely (see DEPLOY_STEP_TIMEOUT_MS above).
const timeoutMs = Number(options.timeoutMs) > 0 ? Number(options.timeoutMs) : DEPLOY_STEP_TIMEOUT_MS const timeoutMs = Number(options.timeoutMs) > 0 ? Number(options.timeoutMs) : DEPLOY_STEP_TIMEOUT_MS
let timedOut = false let timedOut = false
@@ -441,7 +430,7 @@ async function ensureBuildDependencies(previousHead) {
} }
// Hard gate: better-sqlite3 must actually load after the install, or abort the // Hard gate: better-sqlite3 must actually load after the install, or abort the
// deploy here — before promoteBuiltDist()/pm2 reload — so a broken native build // deploy here — before promoteBuiltDist()/systemctl restart — so a broken native build
// can never be promoted to the running workers (which still hold a good binary). // can never be promoted to the running workers (which still hold a good binary).
if (!(await betterSqliteLoads())) { if (!(await betterSqliteLoads())) {
throw new Error( throw new Error(
@@ -708,27 +697,20 @@ async function deploy(push) {
promoteBuiltDist() promoteBuiltDist()
syncVehicleIcons() syncVehicleIcons()
// Reload via the ecosystem file (not by bare name) with --only so each deploy // Each restarted service re-reads .env itself on startup, so a plain
// re-reads the committed env blocks (e.g. VEHICLE_* paths). `pm2 reload <name> // `systemctl restart` always picks up the committed env changes.
// --update-env` would only merge the CLI's process.env and ignore the file. // Exclude this webhook process from the awaited restart: killing the process
// Exclude this webhook process from the awaited reload: killing the process // running this deploy mid-command can interrupt the remaining restarts.
// running this deploy mid-command can interrupt the remaining reloads. const restartTargets = RESTART_TARGETS.filter((t) => t !== SELF_SERVICE_NAME)
const reloadTargets = RESTART_TARGETS.filter((t) => t !== SELF_PM2_NAME) if (restartTargets.length) {
if (reloadTargets.length) { await run('systemctl', ['--user', 'restart', ...restartTargets.map((t) => `${t}.service`)])
await run('pm2', [
'reload',
'ecosystem.config.cjs',
'--only',
reloadTargets.join(','),
'--update-env',
])
} }
await notifyDeployCompleted(push, diff) await notifyDeployCompleted(push, diff)
if (restartTargetsInclude(SELF_PM2_NAME) || pushTouchesWebhookRuntime(push)) { if (restartTargetsInclude(SELF_SERVICE_NAME) || pushTouchesWebhookRuntime(push)) {
scheduleSelfReload( scheduleSelfRestart(
restartTargetsInclude(SELF_PM2_NAME) restartTargetsInclude(SELF_SERVICE_NAME)
? `${SELF_PM2_NAME} is listed in PM2_RESTART_TARGETS` ? `${SELF_SERVICE_NAME} is listed in RESTART_TARGETS`
: 'webhook runtime files changed', : 'webhook runtime files changed',
) )
} }
@@ -860,6 +842,6 @@ webhookServer.listen(PORT, '0.0.0.0', () => {
}) })
setTimeout(() => { setTimeout(() => {
console.log('24 hour webhook refresh reached; exiting for PM2 restart') console.log('24 hour webhook refresh reached; exiting for systemd restart')
process.exit(0) process.exit(0)
}, RESTART_AFTER_MS).unref() }, RESTART_AFTER_MS).unref()