Astarte is a complex, distributed system that may pose several challenges when deployed in production. Individual services report health and metrics to ensure production clusters can be properly monitored and proactive actions can be taken in case of faults or unexpected behavior.
Every Astarte service, whether it's an API service or not, exposes an HTTP endpoint
/health, without versioning, on its HTTP port. By default, services use port
/health is meant to be called frequently and reports the individual health state of a service. It will return
200 in case the service is healthy, or other errors in case the service is having issues. Among those issues, there might be failure in accessing RabbitMQ/RPC communications or failure in accessing the Database.
The aforementioned health checks are integrated in Kubernetes, when using Astarte Operator, as
ReadinessProbe. As such, health monitoring and forced restarts are automatically handled without the need for the administrator to integrate any additional logic.
/health, every service exposes a
/metrics endpoint. This endpoint exposes a series of metrics in Prometheus format, which can be easily integrated and queried from any Prometheus-compatible monitoring solution. Each service, besides exposing stats on its Erlang VM, resource consumption and HTTP stats (where applicable), also exposes a number of service-specific metric, which can be queried to obtain information about Astarte's usage and behavior.
/metrics, being Prometheus-compatible, does not implement any kind of authentication or access control. Ideally, only your scraper should have access to
/metrics, as it can leak sensitive information and should not be exposed to the outer world.
Astarte Operator, by default, forbids access to
/metrics through its ingress, as it assumes your scraper lives within the Kubernetes cluster or has means to access the cluster on its own. However, this behavior can be overridden through by setting
serveMetrics: true in the
api section. An additional parameter,
serveMetricsToSubnet, can be specified to restrict access to
/metrics only to source IPs in a specific subnet. It is strongly recommended to set this up in case an external scraper needs to have access to
/metrics, to ensure access is restricted.