feat(metrics): expose Prometheus /metrics endpoint by edilsonoliveirama · Pull Request #37 · EvolutionAPI/evolution-go

edilsonoliveirama · 2026-04-20T17:26:19Z

O que foi adicionado

Novo endpoint GET /metrics que expõe métricas no formato texto padrão do Prometheus. Sem autenticação — seguindo a convenção do Prometheus de proteger o endpoint na camada de rede/ingress.

Métricas expostas

Métrica	Tipo	Descrição
`evolution_instances_total`	gauge	Total de instâncias registradas
`evolution_instances_connected`	gauge	Instâncias conectadas ao WhatsApp
`evolution_instances_disconnected`	gauge	Instâncias desconectadas
`evolution_http_requests_total`	counter	Requisições HTTP por method/path/status
`evolution_http_request_duration_seconds`	histogram	Latência HTTP por method/path
`evolution_build_info`	gauge	Sempre 1; label `version` contém a versão
`evolution_uptime_seconds`	gauge	Segundos desde o start do servidor

Detalhes técnicos

As métricas de instância usam um Collector customizado que consulta o banco a cada scrape — valores sempre atuais, sem necessidade de hooks em eventos
Labels de path HTTP usam o padrão registrado no Gin (ex: /instance/:instanceId), mantendo a cardinalidade controlada
Registry isolado (não usa o registry global do Prometheus), evitando conflitos com outras libs

Nova dependência

github.com/prometheus/client_golang v1.20.5

Exemplo de uso com Grafana

Configure um datasource Prometheus apontando para http://<host>:<port>/metrics e importe um dashboard padrão de Go ou crie painéis com as métricas evolution_*.

Summary by Sourcery

Expose a Prometheus-compatible /metrics endpoint and add a lightweight HTML dashboard for monitoring instance and server health, while extending chat mute functionality to support configurable durations and tightening related APIs.

New Features:

Add a Prometheus /metrics HTTP endpoint with isolated registry and Gin middleware to expose application, HTTP, and instance metrics.
Introduce a standalone manager dashboard page that visualizes instance status and server health using existing API endpoints.
Allow chat mute operations to specify a mute duration in seconds, including support for permanent mutes via duration 0.

Bug Fixes:

Remove obsolete TODO comments indicating chat pin/archive/mute endpoints were non-functional, aligning annotations with current behavior.

Enhancements:

Extend the instance repository with a method to retrieve all instances for use by metrics collectors.
Clarify the chat mute API documentation to describe duration semantics and example values.
Adjust manager routes to differentiate the new dashboard entry point from the existing React bundle routing.

Build:

Add Prometheus client libraries as direct and indirect Go module dependencies and ensure the dashboard assets are built into the Docker image.

The /manager dashboard previously showed only a static placeholder ("Dashboard content will be implemented here..."). This replaces it with a standalone HTML page that fetches live data from the API and displays real metrics: - Total instances count - Connected instances count and percentage - Disconnected instances count - Server health status (GET /server/ok) - AlwaysOnline count - Instance table with name, status badge, phone number, client and AlwaysOnline indicator - Auto-refresh every 30 seconds with manual refresh button Implementation uses a standalone HTML file (Tailwind CDN + vanilla JS fetch) served at GET /manager, keeping the existing compiled bundle intact for all other routes (/manager/instances, /manager/login, etc.). Changes: - manager/dashboard/index.html: new self-contained dashboard page - pkg/routes/routes.go: serve dashboard/index.html for GET /manager (exact), keep dist/index.html for GET /manager/*any (wildcard) - Dockerfile: copy manager/dashboard/ into the final image - .gitignore: exclude manager build artifacts from version control Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Removes the '// TODO: not working' markers from the six chat endpoints (pin, unpin, archive, unarchive, mute, unmute). Investigation confirmed the implementation is correct: the endpoints work on fully-established sessions that have synced WhatsApp app state keys. The markers were likely added after testing on a fresh session where keys had not yet been distributed by the WhatsApp server. Also fixes the hardcoded 1-hour mute duration: the BodyStruct now accepts an optional `duration` field (seconds). Sending 0 or omitting the field mutes the chat indefinitely, matching WhatsApp's own behaviour.

Reject negative duration values with a 400-level validation error. Document that duration=0 maps to 'mute forever' (BuildMute treats 0 as a zero time.Duration, which causes BuildMuteAbs to set the WhatsApp sentinel timestamp of -1). Clamp duration to a maximum of 1 year (31536000 seconds) to avoid unreasonably large timestamps being sent to the WhatsApp API.

Adds GET /metrics serving standard Prometheus text format. No authentication required — follows the Prometheus convention of protecting the endpoint at the network/ingress level. Metrics exposed: evolution_instances_total total registered instances (gauge) evolution_instances_connected connected instances (gauge) evolution_instances_disconnected disconnected instances (gauge) evolution_http_requests_total HTTP requests by method/path/status (counter) evolution_http_request_duration_seconds HTTP latency by method/path (histogram) evolution_build_info always 1, version label carries the value (gauge) evolution_uptime_seconds seconds since server start (gauge) Instance gauges use a custom Collector that queries the database on each scrape, so values are always current without event hooks. HTTP path labels use Gin registered route patterns (e.g. /instance/:instanceId) to keep cardinality bounded regardless of distinct IDs in the path. New dependency: github.com/prometheus/client_golang v1.20.5

sourcery-ai · 2026-04-20T17:26:26Z

Reviewer's Guide

Adds a Prometheus-based metrics subsystem with an unauthenticated GET /metrics endpoint, integrates HTTP metrics middleware, exposes instance-level gauges via a custom collector, introduces a new HTML dashboard for real-time instance status, enhances chat mute behavior with configurable duration and validation, and slightly adjusts routing and dependencies to support these features.

Sequence diagram for Prometheus scraping the new /metrics endpoint

sequenceDiagram
    participant Prometheus as Prometheus
    participant GinEngine as GinEngine
    participant MetricsRegistry as MetricsRegistry
    participant PrometheusRegistry as PrometheusRegistry
    participant InstanceCollector as instanceCollector
    participant InstanceRepository as InstanceRepository
    participant Database as Database

    Prometheus->>GinEngine: GET /metrics
    GinEngine->>MetricsRegistry: Handler()
    GinEngine->>PrometheusRegistry: ServeHTTP(response, request)

    Note over PrometheusRegistry,InstanceCollector: PrometheusRegistry gathers all registered metrics

    PrometheusRegistry->>InstanceCollector: Collect(ch)
    InstanceCollector->>InstanceRepository: GetAllInstances()
    InstanceRepository->>Database: SELECT * FROM instances
    Database-->>InstanceRepository: instances rows
    InstanceRepository-->>InstanceCollector: []*Instance

    InstanceCollector-->>PrometheusRegistry: Gauge metrics (total, connected, disconnected)

    PrometheusRegistry-->>GinEngine: Text exposition format
    GinEngine-->>Prometheus: 200 OK

Sequence diagram for HTTP request metrics via Gin middleware

sequenceDiagram
    participant Client as HttpClient
    participant GinEngine as GinEngine
    participant GinContext as GinContext
    participant MetricsRegistry as MetricsRegistry
    participant Handler as RouteHandler

    Client->>GinEngine: HTTP request
    GinEngine->>GinContext: Create context
    GinEngine->>MetricsRegistry: GinMiddleware()
    MetricsRegistry->>GinContext: Wrap handler with timing

    GinContext->>Handler: Invoke route handler
    Handler-->>GinContext: Write response

    GinContext-->>MetricsRegistry: Status, method, path, duration
    MetricsRegistry-->>MetricsRegistry: httpRequests.WithLabelValues(...).Inc()
    MetricsRegistry-->>MetricsRegistry: httpDuration.WithLabelValues(...).Observe()

    GinEngine-->>Client: HTTP response

Updated class diagram for metrics registry, instance collector, and chat mute API

classDiagram
    class Registry {
        -prometheus.Registry reg
        -prometheus.CounterVec httpRequests
        -prometheus.HistogramVec httpDuration
        +New(version string, instanceRepo InstanceRepository) Registry
        +Handler() http.Handler
        +GinMiddleware() gin.HandlerFunc
    }

    class instanceCollector {
        -InstanceRepository repo
        -*prometheus.Desc descTotal
        -*prometheus.Desc descConnected
        -*prometheus.Desc descDisconnected
        +Describe(ch chan<- *prometheus.Desc)
        +Collect(ch chan<- prometheus.Metric)
        +newInstanceCollector(repo InstanceRepository) prometheus.Collector
    }

    class InstanceRepository {
        <<interface>>
        +GetAllInstances() ([]*Instance, error)
        +GetAllConnectedInstances() ([]*Instance, error)
        +GetAllConnectedInstancesByClientName(clientName string) ([]*Instance, error)
        +GetAll(clientName string) ([]*Instance, error)
        +Delete(instanceId string) error
        +GetAdvancedSettings(instanceId string) (*AdvancedSettings, error)
        +UpdateAdvancedSettings(instanceId string, settings *AdvancedSettings) error
    }

    class BodyStruct {
        +string Chat
        +int64 Duration
    }

    class chatService {
        +ChatMute(data *BodyStruct, instance *Instance) (string, error)
        +ChatUnmute(data *BodyStruct, instance *Instance) (string, error)
    }

    class appstate {
        +BuildMute(recipient JID, mute bool, duration time.Duration) AppState
    }

    class ChatHandler {
        +ChatMute(ctx *gin.Context)
    }

    class MaxMuteNote {
        <<note>>
        Constant: maxMuteDurationSeconds = 365 * 24 * 3600 (1 year cap)
    }

    Registry --> instanceCollector : registers
    Registry --> InstanceRepository : uses
    instanceCollector --> InstanceRepository : queries
    chatService --> BodyStruct : consumes
    chatService --> appstate : calls BuildMute
    ChatHandler --> chatService : calls ChatMute
    chatService .. MaxMuteNote

Flow diagram for the metrics dashboard data loading

flowchart TD
    User["User opens /manager"] --> Browser["Browser loads dashboard index.html"]
    Browser --> LoadDataFunc["loadData() JS function"]
    LoadDataFunc --> FetchInstances["fetch /instance/all (with apikey)"]
    LoadDataFunc --> FetchServerOk["fetch /server/ok (with apikey)"]

    FetchInstances -->|success| UpdateInstanceMetrics["Update cards: total, connected, disconnected, AlwaysOnline"]
    FetchInstances -->|success| RenderTable["Render instances table"]
    FetchInstances -->|error| ShowInstanceError["Show error and hint about API key"]

    FetchServerOk -->|status ok| UpdateServerOnline["Show server Online, green icon"]
    FetchServerOk -->|error or !ok| UpdateServerError["Show server error status"]

    UpdateInstanceMetrics --> Done["Dashboard visible"]
    RenderTable --> Done
    ShowInstanceError --> Done
    UpdateServerOnline --> Done
    UpdateServerError --> Done

    Done --> Interval["setInterval(loadData, 30000)"]
    User --> RefreshButton["Click Atualizar"]
    RefreshButton --> LoadDataFunc
    Interval --> LoadDataFunc

File-Level Changes

Change	Details	Files
Introduce Prometheus metrics registry, HTTP instrumentation middleware, and /metrics endpoint backed by a custom instance collector.	Create a dedicated metrics Registry that defines and registers HTTP request, latency, build info, uptime, and instance gauges using a local Prometheus registry. Implement a custom Collector that queries the instance repository on each scrape to populate evolution_instances_total/connected/disconnected gauges. Wire the metrics registry into the Gin router, attaching the metrics middleware globally and exposing an unauthenticated GET /metrics endpoint. Extend the instance repository with a GetAllInstances method used by the instance metrics collector.	`pkg/metrics/metrics.go` `cmd/evolution-go/main.go` `pkg/instance/repository/instance_repository.go`
Add a standalone HTML manager dashboard that consumes existing APIs to display instance and server status, and adjust manager routing and Docker image layout to serve it.	Add manager/dashboard/index.html implementing a Tailwind-based dashboard that fetches /instance/all and /server/ok using the stored API key and auto-refreshes every 30 seconds. Change Gin routes so /manager serves the new dashboard and /manager/*any serves the original SPA bundle, preserving client-side routing for the existing manager app. Update Dockerfile to copy the new dashboard directory into the runtime image.	`manager/dashboard/index.html` `pkg/routes/routes.go` `Dockerfile`
Enhance chat mute functionality to accept a duration parameter with bounds checking and clarify API docs.	Extend the chat request body struct with an optional integer Duration field representing mute duration in seconds, used by mute operations. Add validation in ChatMute to reject negative durations and cap maximum duration at one year in seconds. Update ChatMute to pass the requested duration (including 0 for mute forever) to appstate.BuildMute instead of a fixed 1-hour mute. Improve the Swagger description of the mute endpoint to document how to use the duration field.	`pkg/chat/service/chat_service.go` `pkg/chat/handler/chat_handler.go`
Minor API and dependency cleanups to support new behavior.	Remove outdated TODO comments on chat pin/archive/mute routes, leaving behavior unchanged but reflecting current support status. Add prometheus/client_golang and its transitive dependencies to go.mod and go.sum to support metrics collection.	`pkg/routes/routes.go` `go.mod` `go.sum`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="pkg/metrics/metrics.go" line_range="137-140" />
<code_context>
+	ch <- c.descDisconnected
+}
+
+func (c *instanceCollector) Collect(ch chan<- prometheus.Metric) {
+	instances, err := c.repo.GetAllInstances()
+	if err != nil {
+		// Emit nothing on error rather than stale data.
+		return
+	}
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Silently returning on repository errors hides scrape issues and makes diagnostics harder.

If `GetAllInstances` fails, the scrape appears successful but all `evolution_instances_*` series vanish, making failures hard to notice or alert on. Please surface this error—e.g., via an explicit health/error metric (like a gauge `evolution_instance_metrics_up` set to 0 on error, 1 on success), `prometheus.NewInvalidMetric`, and/or logging the error—to improve observability and debugging.

Suggested implementation:

```golang
func (c *instanceCollector) Collect(ch chan<- prometheus.Metric) {
	instances, err := c.repo.GetAllInstances()
	if err != nil {
		// Surface repository errors as an invalid metric so scrape issues are visible.
		ch <- prometheus.NewInvalidMetric(c.descTotal, err)
		return
	}

```

If you prefer a dedicated health gauge instead of (or in addition to) `NewInvalidMetric`, you'll need to:
1. Add a new descriptor to `instanceCollector` (e.g., `descUp *prometheus.Desc`) and initialize it where the collector is constructed, with a name like `evolution_instance_metrics_up`.
2. Emit that metric in `Collect`, setting it to `0` on error and `1` on success, and include it in `Describe`.
</issue_to_address>

### Comment 2
<location path="pkg/metrics/metrics.go" line_range="52-58" />
<code_context>
+		return time.Since(startTime).Seconds()
+	})
+
+	reg.MustRegister(
+		httpRequests,
+		httpDuration,
+		buildInfo,
+		uptimeGauge,
+		newInstanceCollector(instanceRepo),
+	)
+
+	return &Registry{
</code_context>
<issue_to_address>
**suggestion:** The custom Prometheus registry omits Go and process collectors, reducing observability of runtime/resource behavior.

Because this uses a standalone `prometheus.Registry`, it won’t include the default `go_*` or `process_*` metrics from the global registry. To retain standard CPU/memory/goroutine and process visibility, also register `prometheus.NewGoCollector()` and `prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{})` on `reg` alongside your custom metrics.

```suggestion
	reg.MustRegister(
		httpRequests,
		httpDuration,
		buildInfo,
		uptimeGauge,
		newInstanceCollector(instanceRepo),
		prometheus.NewGoCollector(),
		prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}),
	)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-20T17:28:01Z

+func (c *instanceCollector) Collect(ch chan<- prometheus.Metric) {
+	instances, err := c.repo.GetAllInstances()
+	if err != nil {
+		// Emit nothing on error rather than stale data.


suggestion (bug_risk): Silently returning on repository errors hides scrape issues and makes diagnostics harder.

If GetAllInstances fails, the scrape appears successful but all evolution_instances_* series vanish, making failures hard to notice or alert on. Please surface this error—e.g., via an explicit health/error metric (like a gauge evolution_instance_metrics_up set to 0 on error, 1 on success), prometheus.NewInvalidMetric, and/or logging the error—to improve observability and debugging.

Suggested implementation:

func (c *instanceCollector) Collect(ch chan<- prometheus.Metric) { instances, err := c.repo.GetAllInstances() if err != nil { // Surface repository errors as an invalid metric so scrape issues are visible. ch <- prometheus.NewInvalidMetric(c.descTotal, err) return }

If you prefer a dedicated health gauge instead of (or in addition to) NewInvalidMetric, you'll need to:

Add a new descriptor to instanceCollector (e.g., descUp *prometheus.Desc) and initialize it where the collector is constructed, with a name like evolution_instance_metrics_up.

Emit that metric in Collect, setting it to 0 on error and 1 on success, and include it in Describe.

sourcery-ai · 2026-04-20T17:28:01Z

+	reg.MustRegister(
+		httpRequests,
+		httpDuration,
+		buildInfo,
+		uptimeGauge,
+		newInstanceCollector(instanceRepo),
+	)


suggestion: The custom Prometheus registry omits Go and process collectors, reducing observability of runtime/resource behavior.

Because this uses a standalone prometheus.Registry, it won’t include the default go_* or process_* metrics from the global registry. To retain standard CPU/memory/goroutine and process visibility, also register prometheus.NewGoCollector() and prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}) on reg alongside your custom metrics.

Suggested change

reg.MustRegister(

httpRequests,

httpDuration,

buildInfo,

uptimeGauge,

newInstanceCollector(instanceRepo),

)

reg.MustRegister(

httpRequests,

httpDuration,

buildInfo,

uptimeGauge,

newInstanceCollector(instanceRepo),

prometheus.NewGoCollector(),

prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}),

)

paluan-batista

Opa bom dia amigo tudo bem?

Afim de ajudá-lo com um review no seu projeto(pr), deixei alguns comentários pontuais como sugestão.

paluan-batista · 2026-04-21T11:25:36Z

 	r := gin.Default()

+	metricsRegistry := metrics.New(version, instanceRepository)
+	r.Use(metricsRegistry.GinMiddleware())


Sugestão: Mova essa linha para depois da configuração do CORS. Isso evita que o coletor de métricas processe e registre requests que seriam bloqueados imediatamente por políticas de segurança.

paluan-batista · 2026-04-21T11:27:08Z

+}
+
+func (c *instanceCollector) Collect(ch chan<- prometheus.Metric) {
+	instances, err := c.repo.GetAllInstances()


Sugestão: Esta query roda a cada "scrape" do Prometheus. Para milhares de instâncias, isso causará picos de carga no banco. Considere usar um cache ou métricas incrementais em vez de consultar o banco em tempo real.

paluan-batista · 2026-04-21T11:29:42Z

+	if data.Duration < 0 {
+		return "", errors.New("duration must be >= 0 (0 = mute forever)")
+	}
+	if data.Duration > maxMuteDurationSeconds {


Sugestão: A verificação de maxMuteDurationSeconds está correta. Certifique-se apenas de que o erro retornado seja claro (ex: "mute duration exceeds 1 year limit") para que o usuário saiba por que a ação falhou.

edilsonoliveirama and others added 4 commits April 20, 2026 13:28

sourcery-ai Bot reviewed Apr 20, 2026

View reviewed changes

paluan-batista reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): expose Prometheus /metrics endpoint#37

feat(metrics): expose Prometheus /metrics endpoint#37
edilsonoliveirama wants to merge 4 commits intoEvolutionAPI:mainfrom
edilsonoliveirama:feat/prometheus-metrics

edilsonoliveirama commented Apr 20, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Apr 20, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Apr 20, 2026

Uh oh!

sourcery-ai Bot Apr 20, 2026

Uh oh!

paluan-batista left a comment

Uh oh!

paluan-batista Apr 21, 2026

Uh oh!

paluan-batista Apr 21, 2026

Uh oh!

paluan-batista Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edilsonoliveirama commented Apr 20, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

O que foi adicionado

Métricas expostas

Detalhes técnicos

Nova dependência

Exemplo de uso com Grafana

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for Prometheus scraping the new /metrics endpoint

Sequence diagram for HTTP request metrics via Gin middleware

Updated class diagram for metrics registry, instance collector, and chat mute API

Flow diagram for the metrics dashboard data loading

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

paluan-batista left a comment

Choose a reason for hiding this comment

Uh oh!

paluan-batista Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

paluan-batista Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

paluan-batista Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

edilsonoliveirama commented Apr 20, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Apr 20, 2026 •

edited

Loading