Skip to content

OrbStack Bug Report: Silent VM Death with Prolonged Downtime Before Auto-Restart #2314

@mneves75

Description

@mneves75

Describe the bug

OrbStack Docker VM was unexpectedly killed with SIGKILL after 1h 20m of runtime. The GUI process remained alive but failed to provide clear user notification, allowing 211 "docker changed" polling events to accumulate over 2h 18m of downtime before auto-restart occurred. Users had no actionable indication their Docker environment was non-functional.


  Environment
  ┌────────────┬───────────────────────────────────────────────────────────┐
  │   Field    │                           Value                           │
  ├────────────┼───────────────────────────────────────────────────────────┤
  │ OrbStack   │ v2.0.5 (commit: cfe47627f138ffd822c958553b0a93eaf2692c71) │
  ├────────────┼───────────────────────────────────────────────────────────┤
  │ macOS      │ 26.3 (25D5101c)                                           │
  ├────────────┼───────────────────────────────────────────────────────────┤
  │ Hardware   │ Mac16,8 (Apple M4 Pro, 14 cores, 48GB RAM)                │
  ├────────────┼───────────────────────────────────────────────────────────┤
  │ Machine    │ docker (built-in, arm64, Docker default)                  │
  ├────────────┼───────────────────────────────────────────────────────────┤
  │ Diagnostic │ orbstack-diagreport_2026-01-21T14-28-12.135481Z.zip       │
  └────────────┴───────────────────────────────────────────────────────────┘
  ---

  Timeline (UTC)

  01:09:28.891  synthetic state -> spawning     [VM initialization starts]
  01:09:29.338  synthetic state -> starting
  01:09:36.536  synthetic state -> running      [VM fully operational]
  02:29:54.157  Daemon exited: killed (SIGKILL) [CRASH - 1h 20m runtime]
  02:29:54.161  synthetic state -> stopped
  02:29:54.169  Error: vmgrExit(reason: killed (SIGKILL), ...)
                                                          [~2h 18m DOWNTIME]
  04:48:10.574  synthetic state -> spawning     [auto-restart triggered]
  04:48:11.065  synthetic state -> starting
  04:48:24.874  synthetic state -> running      [VM operational again]
  11:19:08.602  synthetic state -> stopped      [user shutdown?]
  11:27:26.677  synthetic state -> spawning     [manual restart]
  11:27:26.805  synthetic state -> starting
  11:27:40.927  synthetic state -> running

To Reproduce

The Bug: Two Distinct Issues

Issue 1: VM Killed with No User Notification

Evidence:
2026-01-21 02:29:54.157 OrbStack[64212:30822462] Daemon exited: killed (SIGKILL)
2026-01-21 02:29:54.169 OrbStack[64212:30822110] Error: vmgrExit(reason: killed (SIGKILL), ...)

Problem: The VM was killed but the GUI process (30822110) survived. Users received no:

  • System notification
  • Menu bar alert
  • Modal error dialog
  • Toast/message indication

Impact: Users may not realize Docker is unavailable until docker ps fails.

Issue 2: Zombie Polling for 2+ Hours

Evidence:
$ grep -c "docker changed" vmgr_logs/gui.log
211

$ awk '/02:29:54/ {start=$1} /04:48:10/ {end=$1; print "Downtime: " start " to " end}'
Downtime: 2026-01-21 02:29:54.169 to 2026-01-21 04:48:10.574

Problem: Between crash (02:29:54) and auto-restart (04:48:10):

  • 2 hours, 18 minutes of downtime
  • 211 "docker changed" events logged (event polling against dead VM)
  • Zero indication to user that recovery was in progress or needed

Impact:

  • Wasted CPU cycles polling dead state
  • Delayed user awareness
  • Unclear if/when auto-restart would occur

Root Cause Hypothesis

The SIGKILL source is unidentified. Possible causes:

  ┌─────────────────────┬───────────────────────────────────────┬────────────┐
  │     Hypothesis      │               Evidence                │ Likelihood │
  ├─────────────────────┼───────────────────────────────────────┼────────────┤
  │ macOS OOM Killer    │ No explicit OOM indicators in logs    │ Medium     │
  ├─────────────────────┼───────────────────────────────────────┼────────────┤
  │ OrbStack Watchdog   │ No watchdog timeout logged            │ Low        │
  ├─────────────────────┼───────────────────────────────────────┼────────────┤
  │ External Kill       │ com.apple.launchd or security product │ Medium     │
  ├─────────────────────┼───────────────────────────────────────┼────────────┤
  │ Resource Exhaustion │ 52GB disk allocated, 44GB free        │ Low        │
  └─────────────────────┴───────────────────────────────────────┴────────────┘

Recommended Investigation:

  • Add SIGKILL source detection (check process_info(pid, PROC_PIDPATHINFO))
  • Log memory/cpu metrics before kill
  • Add watchdog heartbeat with stack traces

Expected behavior

Expected Behavior

  CRASH at 02:29:54
  │
  ├─ Immediate: Display error in menu bar (⚠️ Docker: Error)
  │
  ├─ Within 5s: System notification "OrbStack VM was killed"
  │
  ├─ Within 30s: Attempt auto-restart with progress indicator
  │
  └─ If restart fails: Persistent error state, manual restart button

  ---
  Actual Behavior

  CRASH at 02:29:54
  │
  ├─ GUI continues running silently
  │
  ├─ 211 polling events logged to gui.log (no user visibility)
  │
  ├─ No indicator in menu bar (may show "Docker: Running")
  │
  └─ Auto-restart at 04:48:10 (2h 18m later) - silent recovery

  ---
  Evidence Files
  ┌───────────────────────────────────┬───────────────────────────────────────────┐
  │               File                │                 Relevance                 │
  ├───────────────────────────────────┼───────────────────────────────────────────┤
  │ vmgr_logs/gui.log                 │ State transitions, errors, polling events │
  ├───────────────────────────────────┼───────────────────────────────────────────┤
  │ vmgr_logs/vmgr.log                │ VM manager logs                           │
  ├───────────────────────────────────┼───────────────────────────────────────────┤
  │ machine_logs/docker.*.console.log │ Guest VM console output                   │
  ├───────────────────────────────────┼───────────────────────────────────────────┤
  │ machine_logs/docker.*.runtime.log │ Docker daemon runtime                     │
  └───────────────────────────────────┴───────────────────────────────────────────┘

  Key Log Excerpts:
  # Crash event
  2026-01-21 02:29:54.157 OrbStack[64212:30822462] Daemon exited: killed (SIGKILL)
  2026-01-21 02:29:54.169 OrbStack[64212:30822110] Error: vmgrExit(reason: killed (SIGKILL)...

  # Polling during downtime (logged every few seconds for 2+ hours)
  2026-01-21 02:30:58.630 OrbStack[64212:30822110] docker changed
  2026-01-21 04:48:37.393 OrbStack[64212:30822110] docker changed
  ...

  # Auto-restart (silent, no user notification)
  2026-01-21 04:48:10.574 OrbStack[64212:30822110] synthetic state -> spawning

Reproduction Steps

  1. Start OrbStack Docker VM
  2. Use Docker normally for ~1+ hours (typical workload)
  3. Wait for SIGKILL (trigger unknown - may require specific conditions)
  4. Observe: No user notification of crash
  5. Observe: GUI continues polling dead VM
  6. Observe: ~2+ hour delay before auto-restart

Note: Reproduction may require specific trigger conditions (memory pressure, specific workload, or external process).


Severity Assessment

  ┌─────────────┬─────────┬───────────────────────────────────────────┐
  │  Dimension  │ Rating  │                 Rationale                 │
  ├─────────────┼─────────┼───────────────────────────────────────────┤
  │ User Impact │ High    │ Docker unavailable for 2h+ without notice │
  ├─────────────┼─────────┼───────────────────────────────────────────┤
  │ Detection   │ High    │ No user-facing crash notification         │
  ├─────────────┼─────────┼───────────────────────────────────────────┤
  │ Workaround  │ Low     │ User must manually check docker ps        │
  ├─────────────┼─────────┼───────────────────────────────────────────┤
  │ Frequency   │ Unknown │ First occurrence                          │
  └─────────────┴─────────┴───────────────────────────────────────────┘
  Overall: High - Silent failures are the worst kind.

Suggested Fixes (Priority Order)

  1. Immediate: Show error state in menu bar immediately on VM death
  2. Immediate: Send macOS notification on crash
  3. Short: Attempt restart within 30 seconds, show progress
  4. Medium: Log SIGKILL source for debugging
  5. Long: Debounce polling when VM is known dead

Diagnostic report (REQUIRED)

OrbStack info:
  Version: 2.0.5
  Commit: cfe47627f138ffd822c958553b0a93eaf2692c71 (v2.0.5)

System info:
  macOS: 26.3 (25D5101c)
  CPU: arm64, 14 cores
  CPU model: Apple M4 Pro
  Model: Mac16,8
  Memory: 48 GiB

Full report: https://orbstack.dev/_admin/diag/orbstack-diagreport_2026-01-21T14-27-15.792156Z.zip

Screenshots and additional context (optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    t/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions