fix(web): make deploys safe for the native better-sqlite3 dependency
Restore the dependency-change guard that got overwritten on main, and harden the deploy + worker shutdown so a flaky better-sqlite3 rebuild can no longer take the site down. Root cause of the recurring outages: tssbot-web is the only stack with a native module (better-sqlite3) that must be downloaded/compiled on every `npm ci`. The deploy ran `npm ci` unconditionally (the skip guard had been reverted), with no timeout, and `npm ci` deletes node_modules first -- so a single hung/failed native rebuild left the site unstartable, and a PM2 cluster restart on top wedged the daemon. webhook.cjs: - Restore the npm-ci skip guard: only reinstall when package.json / package-lock.json actually changed (previousHead captured before the pull), so code-only pushes never rebuild better-sqlite3. Defaults to installing on any uncertainty, and still installs if node_modules is incomplete. - Add per-step timeouts to run() (DEPLOY_STEP_TIMEOUT_MS, and a tighter DEPLOY_INSTALL_TIMEOUT_MS for npm ci) so a stalled step is killed instead of hanging for hours with node_modules already deleted. - Gate the deploy on better-sqlite3 actually loading (child-process load, not just require.resolve): force a reinstall when its native binary is missing, and abort before pm2 reload if it is still broken after install. server.cjs: - On shutdown, closeIdleConnections() + delayed closeAllConnections() so a worker stop/reload can't hang the full kill_timeout on idle keep-alives or a stuck upstream request. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -3508,6 +3508,15 @@ function shutdown() {
|
||||
process.exit(0)
|
||||
})
|
||||
|
||||
// server.close() only fires its callback once every socket is gone, and idle
|
||||
// HTTP keep-alive sockets (held open by nginx/Cloudflare) never close on
|
||||
// their own — so without this the worker hangs the full kill_timeout on every
|
||||
// stop/reload, which is what wedges the PM2 cluster daemon. Close idle sockets
|
||||
// immediately, let in-flight requests finish for a short grace period, then
|
||||
// force the rest so shutdown completes well inside kill_timeout.
|
||||
server.closeIdleConnections()
|
||||
setTimeout(() => server.closeAllConnections(), 3000).unref()
|
||||
|
||||
setTimeout(() => {
|
||||
console.error('Graceful shutdown timed out')
|
||||
process.exit(1)
|
||||
|
||||
Reference in New Issue
Block a user