Hosted observability platforms are convenient until cost, data residency, or vendor lock-in become blockers. This article walks through a fully self-hosted monitoring stack deployed with Docker Compose on a single VPS. Every configuration file is reproduced below, along with architecture diagrams, operating tips, and lessons learned while migrating away from Grafana Cloud Agent.
Expect deep dives into each component - from Docker Compose wiring to Grafana provisioning - alongside commentary on why specific decisions serve a secure, host-terminated HTTPS setup. Bring a terminal and the willingness to inspect configs line by line. The narrative ties the moving pieces together so you can apply the pattern to your own workloads.
This article was produced 95% with Codex, guided and reviewed by the author who steered the narrative and decisions. The full working setup took roughly 8 hours, combining the author's Linux and observability expertise with the Codex CLI's computing power to generate every artifact.
1. Architecture Overview
1.1 Topology
Level 1 – System Context
Browser / Admin"] Host["System
Host / VM / Node"] Obs["System
Observability Platform"] App["System
Monitoring Demo App"] %% flows Admin -->|"HTTPS 443"| Host Host -->|"HTTP 127.0.0.1:3000"| Obs App -->|"metrics
logs
traces"| Obs Host -->|"log files"| Obs
Level 2 – Container View (inside Observability)
/var/log/*.log"] HDocker["Docker Engine
/var/run/docker.sock"] HNginx["Nginx
TLS termination"] end Browser["Browser"] Browser -->|"HTTPS 443"| HNginx %% ===================================================== %% COLUMN 2: APPS & AGENTS (produce telemetry) %% ===================================================== subgraph Apps["Workload / Exporters"] direction TB Demo["monitoring-demo-app
127.0.0.1:7005"] NodeExp["Node Exporter
host metrics"] CAdv["cAdvisor
container metrics"] end %% ===================================================== %% COLUMN 3: INGESTION & UI %% ===================================================== subgraph Ingest["Ingestion / UI"] direction TB Grafana["Grafana
127.0.0.1:3000"] OTEL["OTel Collector
ingestion gateway"] Promtail["Promtail
log shipper"] end %% ===================================================== %% COLUMN 4: BACKENDS / STORAGE %% ===================================================== subgraph Backends["Backends"] direction TB Prom["Prometheus
metrics TSDB"] Loki["Loki
log store"] Tempo["Tempo
traces"] TMem["Tempo Memcached
trace index cache"] end %% ===================================================== %% FLOWS %% ===================================================== %% Host → Ingestion (logs) HLogs -->|"static_configs"| Promtail HDocker -->|"docker_sd_configs"| Promtail %% Host → UI HNginx -->|"HTTP 127.0.0.1:3000"| Grafana %% Apps → Ingestion / Backends Demo -->|"OTLP traces"| OTEL Demo -->|"/metrics"| Prom Demo -->|"stdout / stderr"| Promtail NodeExp -->|"metrics"| Prom CAdv -->|"metrics"| Prom %% Ingestion → Backends Promtail -->|"push logs"| Loki OTEL --> Tempo Tempo --> TMem %% Grafana → Backends Grafana -->|"Dashboards"| Prom Grafana -->|"Explore logs"| Loki Grafana -->|"Explore traces"| Tempo
1.2 Telemetry Flow
/hello
/work Note right of Script: Load generator Prom->>App: GET /metrics Note over Prom,App: Pull based scrape every 15s App->>Collector: OTLP /v1/traces Collector->>Tempo: Export spans (gRPC) App->>PTail: stdout / stderr to container logs PTail->>Loki: Push enriched logs Grafana->>Prom: PromQL queries Grafana->>Loki: LogQL queries
(container_name="monitoring-demo-app") Grafana->>Tempo: TraceQL queries
Key principles
- Grafana is the only service bound to localhost (127.0.0.1:3000). Host Nginx terminates TLS for https://monitoring.services.org.pl.
- Everything else communicates over the private Docker network obs.
- Persistent data lives under /opt/docker-volumes/<service>/... on the host.
2. Prerequisites
Before following the walkthrough, you should be comfortable with:
- Administering Docker and Docker Compose on a Linux server (SSH, system packages, file permissions).
 - Core observability concepts—metrics, logs, traces—and how Grafana, Prometheus, Loki, and Tempo expose them.
 - Basic networking and TLS offload patterns so the host Nginx scenario feels familiar.
 
2.1 Install Docker & Compose (Ubuntu 22.04 example)
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker "$USER"   # log out/in afterwards
2.2 Prepare Persistent Directories
sudo mkdir -p \
  /opt/docker-volumes/grafana/data \
  /opt/docker-volumes/prometheus/data \
  /opt/docker-volumes/loki/data \
  /opt/docker-volumes/tempo/data \
  /opt/docker-volumes/promtail/positions
sudo chown -R 472:472 /opt/docker-volumes/grafana
sudo chmod 750 /opt/docker-volumes/grafana /opt/docker-volumes/grafana/data
2.3 Host Nginx & TLS
TLS termination happens on the host. A reference vhost:
server {
  listen 443 ssl http2;
  server_name monitoring.services.org.pl;
  ssl_certificate /etc/letsencrypt/live/monitoring.services.org.pl/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/monitoring.services.org.pl/privkey.pem;
  location / {
    proxy_pass http://127.0.0.1:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto https;
  }
}
Certbot (--nginx) keeps certificates valid; no Nginx container is used.
3. Repository Layout
.
├── docker-compose.yml
├── grafana/
│   └── provisioning/
│       ├── dashboards/demo_app_overview.json
│       └── datasources/datasources.yml
├── prometheus/
│   ├── prometheus.yml
│   └── rules/alerts.yml
├── promtail/config.yml
├── tempo/tempo.yaml
├── otel-collector/config.yaml
├── app/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── app.py
│   └── generate_demo_traffic.sh
├── storage_usage.sh
└── docker_memory_usage.sh
Copy each snippet below into the matching path if you are recreating the stack from scratch.
4. Docker Compose Stack
docker-compose.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
services:
  grafana:
    image: grafana/grafana:12.2.1
    container_name: grafana
    restart: unless-stopped
    environment:
      GF_SERVER_DOMAIN: monitoring.services.org.pl
      GF_SERVER_ROOT_URL: https://monitoring.services.org.pl/
    ports:
      - "127.0.0.1:3000:3000"
    volumes:
      - /opt/docker-volumes/grafana/data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
      - ./grafana/dashboards:/var/lib/grafana/dashboards:ro
    depends_on:
      - prometheus
      - loki
      - tempo
    networks:
      - obs
  prometheus:
    image: prom/prometheus:v3.7.3
    container_name: prometheus
    restart: unless-stopped
    user: "0"
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.enable-lifecycle
    expose:
      - "9090"
    volumes:
      - /opt/docker-volumes/prometheus/data:/prometheus
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules:/etc/prometheus/rules:ro
    networks:
      - obs
  loki:
    image: grafana/loki:3.5.7
    container_name: loki
    restart: unless-stopped
    user: "0"
    command:
      - -config.file=/etc/loki/config.yaml
    expose:
      - "3100"
    volumes:
      - /opt/docker-volumes/loki/data:/loki
      - ./loki/config.yaml:/etc/loki/config.yaml:ro
    networks:
      - obs
  tempo:
    image: grafana/tempo:2.9.0
    container_name: tempo
    restart: unless-stopped
    user: "0"
    command:
      - -config.file=/etc/tempo/tempo.yaml
    depends_on:
      - memcached
    expose:
      - "3200"
      - "4317"
      - "4318"
    volumes:
      - /opt/docker-volumes/tempo/data:/var/tempo
      - ./tempo/tempo.yaml:/etc/tempo/tempo.yaml:ro
    networks:
      - obs
      - tempo-cache
  memcached:
    image: memcached:1.6.33-alpine
    container_name: tempo-memcached
    restart: unless-stopped
    command:
      - -m
      - "256"
      - -p
      - "11211"
    expose:
      - "11211"
    networks:
      - tempo-cache
  otelcol:
    image: otel/opentelemetry-collector-contrib:0.138.0
    container_name: otelcol
    restart: unless-stopped
    command:
      - --config=/etc/otelcol/config.yaml
    expose:
      - "4317"
      - "4318"
      - "8888"
      - "55679"
    volumes:
      - ./otel-collector/config.yaml:/etc/otelcol/config.yaml:ro
    depends_on:
      - tempo
      - loki
      - prometheus
    networks:
      - obs
  promtail:
    image: grafana/promtail:3.5.7
    container_name: promtail
    restart: unless-stopped
    command:
      - -config.file=/etc/promtail/config.yml
    volumes:
      - /opt/docker-volumes/promtail/positions:/var/lib/promtail
      - ./promtail/config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    depends_on:
      - loki
    networks:
      - obs
  node-exporter:
    image: prom/node-exporter:v1.10.2
    container_name: node-exporter
    restart: unless-stopped
    pid: host
    command:
      - --path.procfs=/host/proc
      - --path.sysfs=/host/sys
      - --path.rootfs=/rootfs
      - --collector.filesystem.ignored-mount-points=^/(proc|sys|dev|host|etc)($|/)
      - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
      - --no-collector.ipvs
      - --no-collector.btrfs
      - --no-collector.infiniband
      - --no-collector.xfs
      - --no-collector.zfs
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    networks:
      - obs
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.52.1
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    expose:
      - "8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    networks:
      - obs
  monitoring-demo-app:
    build:
      context: ./app
    container_name: monitoring-demo-app
    restart: unless-stopped
    environment:
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otelcol:4318
      OTEL_SERVICE_NAME: monitoring-demo-app
      OTEL_RESOURCE_ATTRIBUTES: deployment.environment=demo
    depends_on:
      - otelcol
    ports:
      - "127.0.0.1:7005:8000"
    networks:
      - obs
networks:
  obs:
    driver: bridge
  tempo-cache:
    driver: bridge
Highlights:
- Grafana alone publishes to the host (localhost only); everything else uses expose.
- memcached accelerates Tempo search.
- monitoring-demo-app is a local test service exporting metrics, logs, and traces.
5. Prometheus Metrics
prometheus/prometheus.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
global:
  scrape_interval: 15s
  evaluation_interval: 15s
rule_files:
  - /etc/prometheus/rules/*.yml
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets:
          - prometheus:9090
  - job_name: cadvisor
    static_configs:
      - targets:
          - cadvisor:8080
  - job_name: node-exporter
    static_configs:
      - targets:
          - node-exporter:9100
  - job_name: monitoring-demo-app
    metrics_path: /metrics
    static_configs:
      - targets:
          - monitoring-demo-app:8000
Alerting starter pack (prometheus/rules/alerts.yml):
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
groups:
  - name: infrastructure-health
    rules:
      - alert: PrometheusTargetMissing
        expr: up == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: Target {{ $labels.job }} on {{ $labels.instance }} is down
          description: Prometheus has not scraped {{ $labels.job }} on {{ $labels.instance }} for over 5 minutes.
Retention: Prometheus keeps 15 days by default (no explicit --storage.tsdb.retention.time). Adjust the Compose command arguments if you need longer storage.
6. Logs with Promtail & Loki
promtail/config.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
server:
  http_listen_port: 9080
  grpc_listen_port: 0
positions:
  filename: /var/lib/promtail/positions.yaml
clients:
  - url: http://loki:3100/loki/api/v1/push
scrape_configs:
  - job_name: system-logs
    pipeline_stages:
      - drop:
          older_than: 24h
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          host: ${HOSTNAME}
          __path__: /var/log/*.log
  - job_name: docker-containers
    pipeline_stages:
      - docker: {}
      - drop:
          older_than: 24h
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: ['__meta_docker_container_id']
        target_label: '__path__'
        replacement: /var/lib/docker/containers/$1/$1-json.log
      - source_labels: ['__meta_docker_container_name']
        target_label: 'container_name'
        regex: '/(.*)'
        replacement: '$1'
      - source_labels: ['__meta_docker_container_id']
        target_label: 'container_id'
      - source_labels: ['__meta_docker_container_image']
        target_label: 'container_image'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'service_name'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_project']
        target_label: 'compose_project'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'stream'
      - target_label: 'job'
        replacement: 'containers'
      - target_label: 'host'
        replacement: ${HOSTNAME}
Notes:
- Promtail adds container_name and service_name labels, making it easy to filter by Compose service (e.g. {job="containers", container_name="monitoring-demo-app"}).
- Journald ingestion was removed because this host keeps logs in memory-only /run/systemd/journal; tailing /var/log/*.log covers the important services without additional setup.
- Loki stores data in /opt/docker-volumes/loki/data; retention is managed inside Loki’s config (loki/config.yaml) and defaults to compactor-managed chunk pruning.
7. Traces with Tempo & OpenTelemetry Collector
tempo/tempo.yaml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
server:
  http_listen_port: 3200
  log_level: info
cache:
  background:
    writeback_goroutines: 5
  caches:
    - roles:
        - frontend-search
      memcached:
        addresses: memcached:11211
query_frontend:
  metrics:
    max_duration: 200h
    query_backend_after: 5m
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1073741824
  trace_by_id:
    duration_slo: 100ms
distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
compactor:
  compaction:
    block_retention: 720h
storage:
  trace:
    backend: local
    wal:
      path: /var/tempo/wal
    local:
      path: /var/tempo/blocks
metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: docker-compose
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces
  processor:
    local_blocks:
      filter_server_spans: false
      flush_to_storage: true
overrides:
  defaults:
    metrics_generator:
      processors:
        - service-graphs
        - span-metrics
        - local-blocks
Tempo keeps 30 days (720h) of trace blocks locally. Memcached speeds up search queries; if you run on a tiny VPS you can shrink -m 256 in Compose.
OpenTelemetry Collector (otel-collector/config.yaml) receives OTLP traffic and fans out to Tempo plus Prometheus remote write:
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 400
    spike_limit_mib: 100
  batch:
    timeout: 5s
    send_batch_size: 8192
  resource:
    attributes:
      - action: upsert
        key: deployment.environment
        value: prod
exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write
  debug:
    verbosity: basic
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
service:
  extensions:
    - health_check
  telemetry:
    metrics:
      level: basic
  pipelines:
    traces:
      receivers:
        - otlp
      processors:
        - memory_limiter
        - resource
        - batch
      exporters:
        - otlp/tempo
    metrics:
      receivers:
        - otlp
      processors:
        - memory_limiter
        - resource
        - batch
      exporters:
        - prometheusremotewrite
8. Grafana Provisioning
grafana/provisioning/datasources/datasources.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
apiVersion: 1
datasources:
  - uid: prometheus
    name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    jsonData:
      httpMethod: GET
  - uid: loki
    name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      maxLines: 5000
  - uid: tempo
    name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    jsonData:
      httpMethod: GET
      serviceMapDatasourceUid: prometheus
      tracesToLogs:
        datasourceUid: loki
        mapTagNamesEnabled: true
        tags:
          - job
          - host
      tracesToMetrics:
        datasourceUid: prometheus
        tags:
          - service.name
grafana/provisioning/dashboards/demo_app_overview.json
{
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "prometheus"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 0.5
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "options": {
        "legend": {
          "displayMode": "list",
          "placement": "bottom"
        }
      },
      "targets": [
        {
          "expr": "histogram_quantile(0.95, sum(rate(demo_app_request_duration_seconds_bucket[5m])) by (le))",
          "legendFormat": "p95 latency",
          "refId": "A"
        }
      ],
      "title": "Demo App Request Duration (p95)",
      "type": "timeseries"
    }
  ],
  "schemaVersion": 38,
  "style": "dark",
  "tags": [
    "demo"
  ],
  "title": "Demo App Overview",
  "uid": "demo-app-overview"
}
9. Demo Application (monitoring-demo-app)
9.1 Dockerfile
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --upgrade pip && pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt
FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY --from=builder /wheels /wheels
COPY --from=builder /app/requirements.txt .
RUN pip install --no-cache-dir --find-links=/wheels -r requirements.txt
COPY app.py .
EXPOSE 8000
CMD ["python", "app.py"]
9.2 Dependencies
app/requirements.txt
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
flask==3.0.3
prometheus-client==0.20.0
opentelemetry-api==1.27.0
opentelemetry-sdk==1.27.0
opentelemetry-exporter-otlp-proto-http==1.27.0
opentelemetry-instrumentation-flask==0.48b0
9.3 Application Code
app/app.py
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
import logging
import os
import random
import time
from datetime import datetime
from flask import Flask, jsonify, request
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk.resources import Attributes, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from prometheus_client import Counter, Gauge, Histogram, generate_latest
def _setup_logging() -> None:
  logging.basicConfig(
      level=logging.INFO,
      format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
  )
def _setup_tracing() -> None:
  endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otelcol:4318")
  service_name = os.getenv("OTEL_SERVICE_NAME", "monitoring-demo-app")
  resource = Resource.create({"service.name": service_name})
  provider = TracerProvider(resource=resource)
  processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=f"{endpoint.rstrip('/')}/v1/traces"))
  provider.add_span_processor(processor)
  trace.set_tracer_provider(provider)
_setup_logging()
_setup_tracing()
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
logging.getLogger("werkzeug").setLevel(logging.WARNING)
LOGGER = logging.getLogger("monitoring-demo-app")
TRACER = trace.get_tracer(__name__)
REQUEST_COUNTER = Counter(
    "demo_app_requests_total",
    "Total number of requests handled by the demo application",
    ["endpoint", "method"],
)
TEMPERATURE_GAUGE = Gauge(
    "demo_app_temperature_celsius",
    "Simulated temperature value",
)
RESPONSE_HISTOGRAM = Histogram(
    "demo_app_request_duration_seconds",
    "Histogram of request durations",
    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 5),
)
def _observe_temperature() -> None:
  TEMPERATURE_GAUGE.set(18 + random.random() * 10)
@app.before_request
def before_request():
  request.start_time = time.perf_counter()
@app.after_request
def after_request(response):
  elapsed = time.perf_counter() - getattr(request, "start_time", time.perf_counter())
  RESPONSE_HISTOGRAM.observe(elapsed)
  REQUEST_COUNTER.labels(request.path, request.method).inc()
  return response
@app.route("/")
def index():
  LOGGER.info("Root endpoint hit", extra={"client_ip": request.remote_addr})
  return jsonify(
      message="Demo Python service for metrics, logs, and traces.",
      timestamp=datetime.utcnow().isoformat() + "Z",
  )
@app.route("/hello")
def hello():
  name = request.args.get("name", "world")
  LOGGER.info("Saying hello", extra={"hello_name": name})
  with TRACER.start_as_current_span("say-hello") as span:
    span.set_attribute("demo.greeting.name", name)
    time.sleep(random.uniform(0.01, 0.2))
  return jsonify(greeting=f"Hello, {name}!")
@app.route("/work")
def work():
  iterations = int(request.args.get("iterations", 3))
  with TRACER.start_as_current_span("simulate-work") as span:
    span.set_attribute("demo.work.iterations", iterations)
    total = 0
    for i in range(iterations):
      with TRACER.start_as_current_span("work-loop") as loop_span:
        loop_span.set_attribute("demo.work.loop_index", i)
        value = random.randint(1, 100)
        total += value
        LOGGER.debug("Loop iteration", extra={"index": i, "value": value})
        time.sleep(0.05)
  LOGGER.info("Work completed", extra={"work_iterations": iterations, "work_result": total})
  return jsonify(result=total, iterations=iterations)
@app.route("/metrics")
def metrics():
  _observe_temperature()
  return generate_latest(), 200, {"Content-Type": "text/plain; version=0.0.4"}
if __name__ == "__main__":
  host = os.getenv("APP_HOST", "0.0.0.0")
  port = int(os.getenv("APP_PORT", "8000"))
  LOGGER.info("Starting demo application", extra={"host": host, "port": port})
  app.run(host=host, port=port)
9.4 Traffic Generator
app/generate_demo_traffic.sh
#!/usr/bin/env bash
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
set -euo pipefail
BASE_URL=${BASE_URL:-http://127.0.0.1:7005}
ITERATIONS=${ITERATIONS:-5}
if ! command -v curl >/dev/null 2>&1; then
  echo "curl is required to run this script" >&2
  exit 1
fi
echo "Generating demo traffic against $BASE_URL"
for i in $(seq 1 "$ITERATIONS"); do
  name="Grafana-$i"
  echo "[$(date +'%H:%M:%S')] GET /hello?name=$name"
  curl -sf -G "$BASE_URL/hello" --data-urlencode "name=$name" >/dev/null
  loops=$(( (RANDOM % 5) + 1 ))
  echo "[$(date +'%H:%M:%S')] GET /work?iterations=$loops"
  curl -sf -G "$BASE_URL/work" --data-urlencode "iterations=$loops" >/dev/null
  sleep 1
done
echo "Demo traffic completed"
The app uses OTLP/HTTP for traces, Prometheus client for metrics, and standard logging for Loki ingestion. Werkzeug access logs are silenced, leaving the custom monitoring-demo-app logger in Grafana Explore.
10. Operational Utilities
10.1 Storage Footprint
storage_usage.sh
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
#!/usr/bin/env bash
set -euo pipefail
declare -A VOLUME_PATHS=(
  [grafana]="/opt/docker-volumes/grafana/data"
  [prometheus]="/opt/docker-volumes/prometheus/data"
  [loki]="/opt/docker-volumes/loki/data"
  [tempo]="/opt/docker-volumes/tempo/data"
  [promtail]="/opt/docker-volumes/promtail/positions"
)
printf "%-12s %-45s %12s\n" "Component" "Path" "Usage"
printf "%-12s %-45s %12s\n" "---------" "----" "-----"
total_bytes=0
for component in "${!VOLUME_PATHS[@]}"; do
  path="${VOLUME_PATHS[$component]}"
  if sudo test -d "$path"; then
    bytes=$(sudo du -sb -- "${path}" | cut -f1)
    human=$(numfmt --to=iec --suffix=B "$bytes")
  else
    bytes=0
    human="(missing)"
  fi
  total_bytes=$((total_bytes + bytes))
  printf "%-12s %-45s %12s\n" "$component" "$path" "$human"
done
printf "%-12s %-45s %12s\n" "---------" "----" "-----"
printf "%-12s %-45s %12s\n" "TOTAL" "-" "$(numfmt --to=iec --suffix=B "$total_bytes")"
10.2 Container Memory Reporter
docker_memory_usage.sh
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
#!/usr/bin/env bash
set -euo pipefail
if ! command -v docker >/dev/null 2>&1; then
  echo "docker command not found."
  exit 1
fi
printf "%-25s %-40s %12s %12s\n" "CONTAINER ID" "NAME" "MEM USAGE" "LIMIT"
printf "%-25s %-40s %12s %12s\n" "------------" "----" "--------" "-----"
docker stats --no-stream --format "{{.Container}} {{.Name}} {{.MemUsage}} {{.MemPerc}}" | while read -r id name mem memperc; do
  usage=$(echo "$mem" | awk -F'/' '{print $1}')
  limit=$(echo "$mem" | awk -F'/' '{print $2}')
  printf "%-25s %-40s %12s %12s\n" "$id" "$name" "$usage" "$limit"
done
11. Bringing Everything Online
docker compose up -d          # build+start all services
docker compose ps             # confirm containers are healthy
./app/generate_demo_traffic.sh
Verification checklist:
- Prometheus → Status > Targets shows monitoring-demo-app, node-exporter, and cadvisor as UP.
- Grafana → Explore → Loki → {job="containers", container_name="monitoring-demo-app"} reveals enriched logs (with service_name and container_id labels).
- Grafana → Explore → Tempo → service.name="monitoring-demo-app" fetches recent traces.
- Grafana → Dashboards → Demo App Overview renders the p95 histogram panel using:
  promql
  histogram_quantile(0.95,
    sum(rate(demo_app_request_duration_seconds_bucket[5m]))
    by (le))
If you add more services, point them at http://otelcol:4318 for OTLP and add Prometheus scrape jobs where needed (follow the pattern in prometheus.yml).
12. Data Retention & Storage
| Component | Path | Default Retention | How to Adjust | 
|---|---|---|---|
| Grafana | /opt/docker-volumes/grafana/data | 
Until manual prune | Clean via Grafana UI or remove old dashboards/plugins | 
| Prometheus | /opt/docker-volumes/prometheus/data | 
~15 days | Add --storage.tsdb.retention.time=30d in Compose | 
| Loki | /opt/docker-volumes/loki/data | 
Depends on Loki config (chunks & compaction) | Tune in loki/config.yaml | 
| Tempo | /opt/docker-volumes/tempo/data | 
720h (30 days) | Change block_retention in tempo.yaml | 
| Promtail | /opt/docker-volumes/promtail/positions | 
Position offsets (logs kept in source files) | Adjust log retention at source | 
Run ./storage_usage.sh periodically to check disk consumption; it uses sudo internally to access protected paths.
13. Tips & Discoveries
- Container labels in Loki: Docker SD + relabeling exposes 
container_name,service_name, andcompose_project, making Grafana Explore queries much friendlier than raw container IDs. - Journald vs classic logs: Because 
/run/log/journalis ephemeral on this host, Promtail sticks to/var/log/*.log. If you persist journald, add a dedicated scrape job. - Tempo “empty ring” errors: They vanish once Memcached is healthy and the collector sends batches regularly. The included OTEL collector config handles this.
 - Grafana Cloud agent migration: If you previously relied on Grafana Cloud Agent, stop the service, remove the package, clear repo keys, and revoke its API token so traffic flows exclusively through this self-hosted stack.
 - Werkzeug noise reduction: Setting the access logger to 
WARNINGkeeps Loki focused on application logs while still showing Flask request traces. 
14. Where to Go Next
- Add alerting channels (Slack, Email) once you connect Grafana Alerting or Prometheus Alertmanager.
 - Onboard real services: mount 
/opt/otel/opentelemetry-javaagent.jar, setOTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4318, and add Prometheus scrape jobs. - Automate smoke tests (e.g. curl endpoints + Grafana API checks) in CI before deploying Compose changes.
 
Happy monitoring!