Skip to content

Add Prometheus metric for agent first connection duration #21282

@blinkagent

Description

@blinkagent

Feature Request

Problem

When calculating workspace build times using Prometheus metrics, there's a significant gap in observability. The current metrics:

  • coderd_provisionerd_workspace_build_timings_seconds - measures provisioner job duration (init, plan, graph, apply stages)
  • coderd_agentstats_startup_script_seconds - measures startup script execution time
  • coderd_agents_connection_latencies_seconds - measures network latency to DERP relay (not connection time)

However, none of these capture the agent first connection duration — the time from when a workspace agent is created until it first connects to coderd. This can be a significant portion of the overall build time (several minutes in some cases), especially when users mount persistent home directories or have complex infrastructure provisioning.

The build timeline UI shows this as the "connect" stage under the agent, and the data is available via the /api/v2/workspacebuilds/{id}/timings endpoint (as agent_connection_timings), but it's not exposed as a Prometheus metric.

Proposed Solution

Add a new Prometheus metric to capture agent first connection duration, for example:

coderd_agent_first_connection_seconds

This metric could be a histogram (similar to other timing metrics) with labels such as:

  • template_name
  • agent_name
  • workspace_name (optional, may be high cardinality)

The timing calculation already exists in the codebase:

StartedAt: agent.CreatedAt,
EndedAt:   agent.FirstConnectedAt.Time,

Use Cases

  1. Calculate accurate P90/P95 workspace build times that match what users actually experience
  2. Identify and alert on slow agent connection times
  3. Compare build performance across different templates or infrastructure configurations
  4. Debug workspace startup issues by correlating with other metrics

Additional Context

This timing data is already tracked in the database (workspace_agents.first_connected_at - workspace_agents.created_at) and exposed via the API, so this would primarily be about adding the Prometheus instrumentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions