Hosted observability platforms are convenient until cost, data residency, or vendor lock-in become blockers. This article walks through a fully self-hosted monitoring stack deployed with Docker Compose on a single VPS. Every configuration file is reproduced below, along with architecture diagrams, operating tips, and lessons learned while migrating away from Grafana Cloud Agent.
Expect deep dives into each component - from Docker Compose wiring to Grafana provisioning - alongside commentary on why specific decisions serve a secure, host-terminated HTTPS setup. Bring a terminal and the willingness to inspect configs line by line. The narrative ties the moving pieces together so you can apply the pattern to your own workloads.
This article was produced 95% with Codex, guided and reviewed by the author who steered the narrative and decisions. The full working setup took roughly 8 hours, combining the author's Linux and observability expertise with the Codex CLI's computing power to generate every artifact.
1. Architecture Overview
1.1 Topology
Level 1 – System Context
Browser / Admin"] Host["System
Host / VM / Node"] Obs["System
Observability Platform"] App["System
Monitoring Demo App"] %% flows Admin -->|"HTTPS 443"| Host Host -->|"HTTP 127.0.0.1:3000"| Obs App -->|"metrics
logs
traces"| Obs Host -->|"log files"| Obs
Level 2 – Container View (inside Observability)
/var/log/*.log"] HDocker["Docker Engine
/var/run/docker.sock"] HNginx["Nginx
TLS termination"] end Browser["Browser"] Browser -->|"HTTPS 443"| HNginx %% ===================================================== %% COLUMN 2: APPS & AGENTS (produce telemetry) %% ===================================================== subgraph Apps["Workload / Exporters"] direction TB Demo["monitoring-demo-app
127.0.0.1:7005"] NodeExp["Node Exporter
host metrics"] CAdv["cAdvisor
container metrics"] end %% ===================================================== %% COLUMN 3: INGESTION & UI %% ===================================================== subgraph Ingest["Ingestion / UI"] direction TB Grafana["Grafana
127.0.0.1:3000"] OTEL["OTel Collector
ingestion gateway"] Promtail["Promtail
log shipper"] end %% ===================================================== %% COLUMN 4: BACKENDS / STORAGE %% ===================================================== subgraph Backends["Backends"] direction TB Prom["Prometheus
metrics TSDB"] Loki["Loki
log store"] Tempo["Tempo
traces"] TMem["Tempo Memcached
trace index cache"] end %% ===================================================== %% FLOWS %% ===================================================== %% Host → Ingestion (logs) HLogs -->|"static_configs"| Promtail HDocker -->|"docker_sd_configs"| Promtail %% Host → UI HNginx -->|"HTTP 127.0.0.1:3000"| Grafana %% Apps → Ingestion / Backends Demo -->|"OTLP traces"| OTEL Demo -->|"/metrics"| Prom Demo -->|"stdout / stderr"| Promtail NodeExp -->|"metrics"| Prom CAdv -->|"metrics"| Prom %% Ingestion → Backends Promtail -->|"push logs"| Loki OTEL --> Tempo Tempo --> TMem %% Grafana → Backends Grafana -->|"Dashboards"| Prom Grafana -->|"Explore logs"| Loki Grafana -->|"Explore traces"| Tempo
1.2 Telemetry Flow
/hello
/work Note right of Script: Load generator Prom->>App: GET /metrics Note over Prom,App: Pull based scrape every 15s App->>Collector: OTLP /v1/traces Collector->>Tempo: Export spans (gRPC) App->>PTail: stdout / stderr to container logs PTail->>Loki: Push enriched logs Grafana->>Prom: PromQL queries Grafana->>Loki: LogQL queries
(container_name="monitoring-demo-app") Grafana->>Tempo: TraceQL queries
Key principles
- Grafana is the only service bound to localhost (127.0.0.1:3000). Host Nginx terminates TLS for https://monitoring.services.org.pl.
- Everything else communicates over the private Docker network obs.
- Persistent data lives under /opt/docker-volumes/<service>/... on the host.
2. Prerequisites
Before following the walkthrough, you should be comfortable with:
- Administering Docker and Docker Compose on a Linux server (SSH, system packages, file permissions).
- Core observability concepts—metrics, logs, traces—and how Grafana, Prometheus, Loki, and Tempo expose them.
- Basic networking and TLS offload patterns so the host Nginx scenario feels familiar.
2.1 Install Docker & Compose (Ubuntu 22.04 example)
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker "$USER" # log out/in afterwards
2.2 Prepare Persistent Directories
sudo mkdir -p \
/opt/docker-volumes/grafana/data \
/opt/docker-volumes/prometheus/data \
/opt/docker-volumes/loki/data \
/opt/docker-volumes/tempo/data \
/opt/docker-volumes/promtail/positions
sudo chown -R 472:472 /opt/docker-volumes/grafana
sudo chmod 750 /opt/docker-volumes/grafana /opt/docker-volumes/grafana/data
2.3 Host Nginx & TLS
TLS termination happens on the host. A reference vhost:
server {
listen 443 ssl http2;
server_name monitoring.services.org.pl;
ssl_certificate /etc/letsencrypt/live/monitoring.services.org.pl/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/monitoring.services.org.pl/privkey.pem;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
}
}
Certbot (--nginx) keeps certificates valid; no Nginx container is used.
3. Repository Layout
.
├── docker-compose.yml
├── grafana/
│ └── provisioning/
│ ├── dashboards/demo_app_overview.json
│ └── datasources/datasources.yml
├── prometheus/
│ ├── prometheus.yml
│ └── rules/alerts.yml
├── promtail/config.yml
├── tempo/tempo.yaml
├── otel-collector/config.yaml
├── app/
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── app.py
│ └── generate_demo_traffic.sh
├── storage_usage.sh
└── docker_memory_usage.sh
Copy each snippet below into the matching path if you are recreating the stack from scratch.
4. Docker Compose Stack
docker-compose.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
services:
grafana:
image: grafana/grafana:12.2.1
container_name: grafana
restart: unless-stopped
environment:
GF_SERVER_DOMAIN: monitoring.services.org.pl
GF_SERVER_ROOT_URL: https://monitoring.services.org.pl/
ports:
- "127.0.0.1:3000:3000"
volumes:
- /opt/docker-volumes/grafana/data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
depends_on:
- prometheus
- loki
- tempo
networks:
- obs
prometheus:
image: prom/prometheus:v3.7.3
container_name: prometheus
restart: unless-stopped
user: "0"
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.enable-lifecycle
expose:
- "9090"
volumes:
- /opt/docker-volumes/prometheus/data:/prometheus
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules:/etc/prometheus/rules:ro
networks:
- obs
loki:
image: grafana/loki:3.5.7
container_name: loki
restart: unless-stopped
user: "0"
command:
- -config.file=/etc/loki/config.yaml
expose:
- "3100"
volumes:
- /opt/docker-volumes/loki/data:/loki
- ./loki/config.yaml:/etc/loki/config.yaml:ro
networks:
- obs
tempo:
image: grafana/tempo:2.9.0
container_name: tempo
restart: unless-stopped
user: "0"
command:
- -config.file=/etc/tempo/tempo.yaml
depends_on:
- memcached
expose:
- "3200"
- "4317"
- "4318"
volumes:
- /opt/docker-volumes/tempo/data:/var/tempo
- ./tempo/tempo.yaml:/etc/tempo/tempo.yaml:ro
networks:
- obs
- tempo-cache
memcached:
image: memcached:1.6.33-alpine
container_name: tempo-memcached
restart: unless-stopped
command:
- -m
- "256"
- -p
- "11211"
expose:
- "11211"
networks:
- tempo-cache
otelcol:
image: otel/opentelemetry-collector-contrib:0.138.0
container_name: otelcol
restart: unless-stopped
command:
- --config=/etc/otelcol/config.yaml
expose:
- "4317"
- "4318"
- "8888"
- "55679"
volumes:
- ./otel-collector/config.yaml:/etc/otelcol/config.yaml:ro
depends_on:
- tempo
- loki
- prometheus
networks:
- obs
promtail:
image: grafana/promtail:3.5.7
container_name: promtail
restart: unless-stopped
command:
- -config.file=/etc/promtail/config.yml
volumes:
- /opt/docker-volumes/promtail/positions:/var/lib/promtail
- ./promtail/config.yml:/etc/promtail/config.yml:ro
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
depends_on:
- loki
networks:
- obs
node-exporter:
image: prom/node-exporter:v1.10.2
container_name: node-exporter
restart: unless-stopped
pid: host
command:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/rootfs
- --collector.filesystem.ignored-mount-points=^/(proc|sys|dev|host|etc)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
- --no-collector.ipvs
- --no-collector.btrfs
- --no-collector.infiniband
- --no-collector.xfs
- --no-collector.zfs
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
networks:
- obs
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.52.1
container_name: cadvisor
restart: unless-stopped
privileged: true
expose:
- "8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- obs
monitoring-demo-app:
build:
context: ./app
container_name: monitoring-demo-app
restart: unless-stopped
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://otelcol:4318
OTEL_SERVICE_NAME: monitoring-demo-app
OTEL_RESOURCE_ATTRIBUTES: deployment.environment=demo
depends_on:
- otelcol
ports:
- "127.0.0.1:7005:8000"
networks:
- obs
networks:
obs:
driver: bridge
tempo-cache:
driver: bridge
Highlights:
- Grafana alone publishes to the host (localhost only); everything else uses expose.
- memcached accelerates Tempo search.
- monitoring-demo-app is a local test service exporting metrics, logs, and traces.
5. Prometheus Metrics
prometheus/prometheus.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- prometheus:9090
- job_name: cadvisor
static_configs:
- targets:
- cadvisor:8080
- job_name: node-exporter
static_configs:
- targets:
- node-exporter:9100
- job_name: monitoring-demo-app
metrics_path: /metrics
static_configs:
- targets:
- monitoring-demo-app:8000
Alerting starter pack (prometheus/rules/alerts.yml):
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
groups:
- name: infrastructure-health
rules:
- alert: PrometheusTargetMissing
expr: up == 0
for: 5m
labels:
severity: warning
annotations:
summary: Target {{ $labels.job }} on {{ $labels.instance }} is down
description: Prometheus has not scraped {{ $labels.job }} on {{ $labels.instance }} for over 5 minutes.
Retention: Prometheus keeps 15 days by default (no explicit --storage.tsdb.retention.time). Adjust the Compose command arguments if you need longer storage.
6. Logs with Promtail & Loki
promtail/config.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system-logs
pipeline_stages:
- drop:
older_than: 24h
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: ${HOSTNAME}
__path__: /var/log/*.log
- job_name: docker-containers
pipeline_stages:
- docker: {}
- drop:
older_than: 24h
docker_sd_configs:
- host: unix:///var/run/docker.sock
relabel_configs:
- source_labels: ['__meta_docker_container_id']
target_label: '__path__'
replacement: /var/lib/docker/containers/$1/$1-json.log
- source_labels: ['__meta_docker_container_name']
target_label: 'container_name'
regex: '/(.*)'
replacement: '$1'
- source_labels: ['__meta_docker_container_id']
target_label: 'container_id'
- source_labels: ['__meta_docker_container_image']
target_label: 'container_image'
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'service_name'
- source_labels: ['__meta_docker_container_label_com_docker_compose_project']
target_label: 'compose_project'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'stream'
- target_label: 'job'
replacement: 'containers'
- target_label: 'host'
replacement: ${HOSTNAME}
Notes:
- Promtail adds container_name and service_name labels, making it easy to filter by Compose service (e.g. {job="containers", container_name="monitoring-demo-app"}).
- Journald ingestion was removed because this host keeps logs in memory-only /run/systemd/journal; tailing /var/log/*.log covers the important services without additional setup.
- Loki stores data in /opt/docker-volumes/loki/data; retention is managed inside Loki’s config (loki/config.yaml) and defaults to compactor-managed chunk pruning.
7. Traces with Tempo & OpenTelemetry Collector
tempo/tempo.yaml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
server:
http_listen_port: 3200
log_level: info
cache:
background:
writeback_goroutines: 5
caches:
- roles:
- frontend-search
memcached:
addresses: memcached:11211
query_frontend:
metrics:
max_duration: 200h
query_backend_after: 5m
search:
duration_slo: 5s
throughput_bytes_slo: 1073741824
trace_by_id:
duration_slo: 100ms
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
compactor:
compaction:
block_retention: 720h
storage:
trace:
backend: local
wal:
path: /var/tempo/wal
local:
path: /var/tempo/blocks
metrics_generator:
registry:
external_labels:
source: tempo
cluster: docker-compose
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
traces_storage:
path: /var/tempo/generator/traces
processor:
local_blocks:
filter_server_spans: false
flush_to_storage: true
overrides:
defaults:
metrics_generator:
processors:
- service-graphs
- span-metrics
- local-blocks
Tempo keeps 30 days (720h) of trace blocks locally. Memcached speeds up search queries; if you run on a tiny VPS you can shrink -m 256 in Compose.
OpenTelemetry Collector (otel-collector/config.yaml) receives OTLP traffic and fans out to Tempo plus Prometheus remote write:
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 1s
limit_mib: 400
spike_limit_mib: 100
batch:
timeout: 5s
send_batch_size: 8192
resource:
attributes:
- action: upsert
key: deployment.environment
value: prod
exporters:
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
debug:
verbosity: basic
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions:
- health_check
telemetry:
metrics:
level: basic
pipelines:
traces:
receivers:
- otlp
processors:
- memory_limiter
- resource
- batch
exporters:
- otlp/tempo
metrics:
receivers:
- otlp
processors:
- memory_limiter
- resource
- batch
exporters:
- prometheusremotewrite
8. Grafana Provisioning
grafana/provisioning/datasources/datasources.yml
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
apiVersion: 1
datasources:
- uid: prometheus
name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
jsonData:
httpMethod: GET
- uid: loki
name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
maxLines: 5000
- uid: tempo
name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
jsonData:
httpMethod: GET
serviceMapDatasourceUid: prometheus
tracesToLogs:
datasourceUid: loki
mapTagNamesEnabled: true
tags:
- job
- host
tracesToMetrics:
datasourceUid: prometheus
tags:
- service.name
grafana/provisioning/dashboards/demo_app_overview.json
{
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 0.5
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom"
}
},
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(demo_app_request_duration_seconds_bucket[5m])) by (le))",
"legendFormat": "p95 latency",
"refId": "A"
}
],
"title": "Demo App Request Duration (p95)",
"type": "timeseries"
}
],
"schemaVersion": 38,
"style": "dark",
"tags": [
"demo"
],
"title": "Demo App Overview",
"uid": "demo-app-overview"
}
9. Demo Application (monitoring-demo-app)
9.1 Dockerfile
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --upgrade pip && pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt
FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY --from=builder /wheels /wheels
COPY --from=builder /app/requirements.txt .
RUN pip install --no-cache-dir --find-links=/wheels -r requirements.txt
COPY app.py .
EXPOSE 8000
CMD ["python", "app.py"]
9.2 Dependencies
app/requirements.txt
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
flask==3.0.3
prometheus-client==0.20.0
opentelemetry-api==1.27.0
opentelemetry-sdk==1.27.0
opentelemetry-exporter-otlp-proto-http==1.27.0
opentelemetry-instrumentation-flask==0.48b0
9.3 Application Code
app/app.py
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
import logging
import os
import random
import time
from datetime import datetime
from flask import Flask, jsonify, request
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk.resources import Attributes, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from prometheus_client import Counter, Gauge, Histogram, generate_latest
def _setup_logging() -> None:
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
def _setup_tracing() -> None:
endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://otelcol:4318")
service_name = os.getenv("OTEL_SERVICE_NAME", "monitoring-demo-app")
resource = Resource.create({"service.name": service_name})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=f"{endpoint.rstrip('/')}/v1/traces"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
_setup_logging()
_setup_tracing()
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
logging.getLogger("werkzeug").setLevel(logging.WARNING)
LOGGER = logging.getLogger("monitoring-demo-app")
TRACER = trace.get_tracer(__name__)
REQUEST_COUNTER = Counter(
"demo_app_requests_total",
"Total number of requests handled by the demo application",
["endpoint", "method"],
)
TEMPERATURE_GAUGE = Gauge(
"demo_app_temperature_celsius",
"Simulated temperature value",
)
RESPONSE_HISTOGRAM = Histogram(
"demo_app_request_duration_seconds",
"Histogram of request durations",
buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 5),
)
def _observe_temperature() -> None:
TEMPERATURE_GAUGE.set(18 + random.random() * 10)
@app.before_request
def before_request():
request.start_time = time.perf_counter()
@app.after_request
def after_request(response):
elapsed = time.perf_counter() - getattr(request, "start_time", time.perf_counter())
RESPONSE_HISTOGRAM.observe(elapsed)
REQUEST_COUNTER.labels(request.path, request.method).inc()
return response
@app.route("/")
def index():
LOGGER.info("Root endpoint hit", extra={"client_ip": request.remote_addr})
return jsonify(
message="Demo Python service for metrics, logs, and traces.",
timestamp=datetime.utcnow().isoformat() + "Z",
)
@app.route("/hello")
def hello():
name = request.args.get("name", "world")
LOGGER.info("Saying hello", extra={"hello_name": name})
with TRACER.start_as_current_span("say-hello") as span:
span.set_attribute("demo.greeting.name", name)
time.sleep(random.uniform(0.01, 0.2))
return jsonify(greeting=f"Hello, {name}!")
@app.route("/work")
def work():
iterations = int(request.args.get("iterations", 3))
with TRACER.start_as_current_span("simulate-work") as span:
span.set_attribute("demo.work.iterations", iterations)
total = 0
for i in range(iterations):
with TRACER.start_as_current_span("work-loop") as loop_span:
loop_span.set_attribute("demo.work.loop_index", i)
value = random.randint(1, 100)
total += value
LOGGER.debug("Loop iteration", extra={"index": i, "value": value})
time.sleep(0.05)
LOGGER.info("Work completed", extra={"work_iterations": iterations, "work_result": total})
return jsonify(result=total, iterations=iterations)
@app.route("/metrics")
def metrics():
_observe_temperature()
return generate_latest(), 200, {"Content-Type": "text/plain; version=0.0.4"}
if __name__ == "__main__":
host = os.getenv("APP_HOST", "0.0.0.0")
port = int(os.getenv("APP_PORT", "8000"))
LOGGER.info("Starting demo application", extra={"host": host, "port": port})
app.run(host=host, port=port)
9.4 Traffic Generator
app/generate_demo_traffic.sh
#!/usr/bin/env bash
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
set -euo pipefail
BASE_URL=${BASE_URL:-http://127.0.0.1:7005}
ITERATIONS=${ITERATIONS:-5}
if ! command -v curl >/dev/null 2>&1; then
echo "curl is required to run this script" >&2
exit 1
fi
echo "Generating demo traffic against $BASE_URL"
for i in $(seq 1 "$ITERATIONS"); do
name="Grafana-$i"
echo "[$(date +'%H:%M:%S')] GET /hello?name=$name"
curl -sf -G "$BASE_URL/hello" --data-urlencode "name=$name" >/dev/null
loops=$(( (RANDOM % 5) + 1 ))
echo "[$(date +'%H:%M:%S')] GET /work?iterations=$loops"
curl -sf -G "$BASE_URL/work" --data-urlencode "iterations=$loops" >/dev/null
sleep 1
done
echo "Demo traffic completed"
The app uses OTLP/HTTP for traces, Prometheus client for metrics, and standard logging for Loki ingestion. Werkzeug access logs are silenced, leaving the custom monitoring-demo-app logger in Grafana Explore.
10. Operational Utilities
10.1 Storage Footprint
storage_usage.sh
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
#!/usr/bin/env bash
set -euo pipefail
declare -A VOLUME_PATHS=(
[grafana]="/opt/docker-volumes/grafana/data"
[prometheus]="/opt/docker-volumes/prometheus/data"
[loki]="/opt/docker-volumes/loki/data"
[tempo]="/opt/docker-volumes/tempo/data"
[promtail]="/opt/docker-volumes/promtail/positions"
)
printf "%-12s %-45s %12s\n" "Component" "Path" "Usage"
printf "%-12s %-45s %12s\n" "---------" "----" "-----"
total_bytes=0
for component in "${!VOLUME_PATHS[@]}"; do
path="${VOLUME_PATHS[$component]}"
if sudo test -d "$path"; then
bytes=$(sudo du -sb -- "${path}" | cut -f1)
human=$(numfmt --to=iec --suffix=B "$bytes")
else
bytes=0
human="(missing)"
fi
total_bytes=$((total_bytes + bytes))
printf "%-12s %-45s %12s\n" "$component" "$path" "$human"
done
printf "%-12s %-45s %12s\n" "---------" "----" "-----"
printf "%-12s %-45s %12s\n" "TOTAL" "-" "$(numfmt --to=iec --suffix=B "$total_bytes")"
10.2 Container Memory Reporter
docker_memory_usage.sh
# Generated by Codex Agent — do not edit manually
# Date: 2025-10-31
#!/usr/bin/env bash
set -euo pipefail
if ! command -v docker >/dev/null 2>&1; then
echo "docker command not found."
exit 1
fi
printf "%-25s %-40s %12s %12s\n" "CONTAINER ID" "NAME" "MEM USAGE" "LIMIT"
printf "%-25s %-40s %12s %12s\n" "------------" "----" "--------" "-----"
docker stats --no-stream --format "{{.Container}} {{.Name}} {{.MemUsage}} {{.MemPerc}}" | while read -r id name mem memperc; do
usage=$(echo "$mem" | awk -F'/' '{print $1}')
limit=$(echo "$mem" | awk -F'/' '{print $2}')
printf "%-25s %-40s %12s %12s\n" "$id" "$name" "$usage" "$limit"
done
11. Bringing Everything Online
docker compose up -d # build+start all services
docker compose ps # confirm containers are healthy
./app/generate_demo_traffic.sh
Verification checklist:
- Prometheus → Status > Targets shows monitoring-demo-app, node-exporter, and cadvisor as UP.
- Grafana → Explore → Loki → {job="containers", container_name="monitoring-demo-app"} reveals enriched logs (with service_name and container_id labels).
- Grafana → Explore → Tempo → service.name="monitoring-demo-app" fetches recent traces.
- Grafana → Dashboards → Demo App Overview renders the p95 histogram panel using:
promql
histogram_quantile(0.95,
sum(rate(demo_app_request_duration_seconds_bucket[5m]))
by (le))
If you add more services, point them at http://otelcol:4318 for OTLP and add Prometheus scrape jobs where needed (follow the pattern in prometheus.yml).
12. Data Retention & Storage
| Component | Path | Default Retention | How to Adjust |
|---|---|---|---|
| Grafana | /opt/docker-volumes/grafana/data |
Until manual prune | Clean via Grafana UI or remove old dashboards/plugins |
| Prometheus | /opt/docker-volumes/prometheus/data |
~15 days | Add --storage.tsdb.retention.time=30d in Compose |
| Loki | /opt/docker-volumes/loki/data |
Depends on Loki config (chunks & compaction) | Tune in loki/config.yaml |
| Tempo | /opt/docker-volumes/tempo/data |
720h (30 days) | Change block_retention in tempo.yaml |
| Promtail | /opt/docker-volumes/promtail/positions |
Position offsets (logs kept in source files) | Adjust log retention at source |
Run ./storage_usage.sh periodically to check disk consumption; it uses sudo internally to access protected paths.
13. Tips & Discoveries
- Container labels in Loki: Docker SD + relabeling exposes
container_name,service_name, andcompose_project, making Grafana Explore queries much friendlier than raw container IDs. - Journald vs classic logs: Because
/run/log/journalis ephemeral on this host, Promtail sticks to/var/log/*.log. If you persist journald, add a dedicated scrape job. - Tempo “empty ring” errors: They vanish once Memcached is healthy and the collector sends batches regularly. The included OTEL collector config handles this.
- Grafana Cloud agent migration: If you previously relied on Grafana Cloud Agent, stop the service, remove the package, clear repo keys, and revoke its API token so traffic flows exclusively through this self-hosted stack.
- Werkzeug noise reduction: Setting the access logger to
WARNINGkeeps Loki focused on application logs while still showing Flask request traces.
14. Where to Go Next
- Add alerting channels (Slack, Email) once you connect Grafana Alerting or Prometheus Alertmanager.
- Onboard real services: mount
/opt/otel/opentelemetry-javaagent.jar, setOTEL_EXPORTER_OTLP_ENDPOINT=http://otelcol:4318, and add Prometheus scrape jobs. - Automate smoke tests (e.g. curl endpoints + Grafana API checks) in CI before deploying Compose changes.
Happy monitoring!