View Source Monitoring
Astarte is a complex, distributed system that may pose several challenges when deployed in production. Individual services report health and metrics to ensure production clusters can be properly monitored and proactive actions can be taken in case of faults or unexpected behavior.
health-checks
Health checks
Every Astarte service, whether it's an API service or not, exposes an HTTP endpoint /health
, without versioning, on its HTTP port. By default, services use port 4000
.
/health
is meant to be called frequently and reports the individual health state of a service. It will return 200
in case the service is healthy, or other errors in case the service is having issues. Among those issues, there might be failure in accessing RabbitMQ/RPC communications or failure in accessing the Database.
health-checks-and-kubernetes
Health checks and Kubernetes
The aforementioned health checks are integrated in Kubernetes, when using Astarte Operator, as LivenessProbe
and ReadinessProbe
. As such, health monitoring and forced restarts are automatically handled without the need for the administrator to integrate any additional logic.
service-metrics
Service metrics
Just like /health
, every service exposes a /metrics
endpoint. This endpoint exposes a series of metrics in Prometheus format, which can be easily integrated and queried from any Prometheus-compatible monitoring solution. Each service, besides exposing stats on its Erlang VM, resource consumption and HTTP stats (where applicable), also exposes a number of service-specific metric, which can be queried to obtain information about Astarte's usage and behavior.
authentication-and-access-to-metrics
Authentication and access to metrics
/metrics
, being Prometheus-compatible, does not implement any kind of authentication or access control. Ideally, only your scraper should have access to /metrics
, as it can leak sensitive information and should not be exposed to the outer world.
Astarte Operator, by default, forbids access to /metrics
through its ingress, as it assumes your scraper lives within the Kubernetes cluster or has means to access the cluster on its own. However, this behavior can be overridden through by setting serveMetrics: true
in the api
section. An additional parameter, serveMetricsToSubnet
, can be specified to restrict access to /metrics
only to source IPs in a specific subnet. It is strongly recommended to set this up in case an external scraper needs to have access to /metrics
, to ensure access is restricted.