Export Metrics from Databricks Serving Endpoint to Datadog

Alexey Artemov
3 min readJan 28, 2024

--

Source databricks.com

Overview

If you are using a Databricks serving endpoint, and you wish to export metrics to Datadog, you can face some challenges in Datadog documentation, like: outdated documentation, there is no quick start guide. This article will help you to solve them, at least it will help you to make a PoC.

Databricks serving endpoint is a service that allows you to deploy your model and serve it. It is a part of Databricks MLflow. You can find more information about it here.

Once you created an endpoint, out of the box you can see some metrics in Databricks UI. Also, the list of metrics can be received via REST API:

curl --request GET "https://${DATABRICKS_HOST}/api/2.0/serving-endpoints/{endpoint_name}/metrics" --header "Authorization: Bearer ${DATABRICKS_TOKEN}"

But if you want to export them to Datadog, you need to install and configure the Datadog agent.

Pre-requisites

Install Datadog agent

Datadog offers several ways to install the agent, but some of them are not working, or at least I was not able to make them work. That is why I decided to use the most simple way — install the agent on Docker.

Before agent installation, you need to create a file datadog.yaml, which should be located /etc/datadog-agent/datadog.yaml, with the following content:

api_key: "__DD_API_KEY__"
site: "datadoghq.com"
hostname: "__YOUR_HOST_NAME__"

Then you need to create metric-config.yaml file, which should be located /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml.default, with the following content:

instances:
- openmetrics_endpoint: https://__DATABRICKS_HOST__/api/2.0/serving-endpoints/__ENDPOINT_NAME__/metrics

metrics:
- cpu_usage_percentage:
name: cpu_usage_percentage
type: gauge
- mem_usage_percentage:
name: mem_usage_percentage
type: gauge
- provisioned_concurrent_requests_total:
name: provisioned_concurrent_requests_total
type: gauge
- request_4xx_count_total:
name: request_4xx_count_total
type: gauge
- request_5xx_count_total:
name: request_5xx_count_total
type: gauge
- request_count_total:
name: request_count_total
type: gauge
- request_latency_ms:
name: request_latency_ms
type: histogram

tag_by_endpoint: false

send_distribution_buckets: true

headers:
Authorization: Bearer __DATABRICKS_TOKEN__
Content-Type: application/openmetrics-text

The full possible options for conf.yaml.default can be found in the template file which is located in a container under /opt/datadog-agent/agent/conf.d/openmetrics.d/ .

I’ve created the next Dockerfile, because I have some difficulties with the one provided in Datadog documentation:

FROM ubuntu:20.04

# Set environment variables
ENV DD_API_KEY=[SPECIFY_DD_API_KEY]
ENV DD_SITE=datadoghq.com
ENV DD_HOSTNAME=datadog-agent-ubuntu
ENV DATABRICKS_HOST=[SPECIFY_DATABRICKS_HOST]
ENV DATABRICKS_TOKEN=[SPECIFY_DATABRICKS_TOKEN]
ENV ENDPOINT_NAME=[SPECIFY_ENDPOINT_NAME]

# Install curl
RUN apt-get update && apt-get install -y curl

# Copy the datadog.yaml file into the Docker image
COPY datadog.yaml /etc/datadog-agent/datadog.yaml
RUN sed -i -e "s@__DD_API_KEY__@$DD_API_KEY@g" /etc/datadog-agent/datadog.yaml

# Install the Datadog agent
RUN bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"

# Copy the metrics-config.yaml file into the Docker image
COPY metrics-config.yaml /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml.default

RUN sed -i -e "s@__DATABRICKS_HOST__@$DATABRICKS_HOST@g; s@__ENDPOINT_NAME__@$ENDPOINT_NAME@g; s@__DATABRICKS_TOKEN__@$DATABRICKS_TOKEN@g; " /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml.default

RUN service datadog-agent restart

# Run a process that doesn't exit
CMD ["tail", "-f", "/dev/null"]

Usage

  • Run serving endpoint
  • Define environment variables in `Dockerfile’
  • Build docker image
  • Run docker container
# Step 1: Build the Docker image
# Navigate to the directory containing the Dockerfile
cd /path/to/directory/with/dockerfile

# Build the Docker image
# Replace 'my-image' with the name you want to give to your Docker image
docker build -t datadoc-agent-image .

# Step 2: Run a container from the built image
# Replace 'my-container' with the name you want to give to your Docker container
docker run --name datadog-agent -d datadoc-agent-image

# Step 3: Connect to a container and restart agent
docker exec -it datadog-agent bash
service datadog-agent restart

Some usefully commands for DataDog agent:

datadog-agent status
datadog-agent health

service datadog-agent start
service datadog-agent restart

Useful links

--

--