Before OpenTelemetry, getting traces, metrics, and logs out of Kubernetes applications meant three different SDKs, three different agents, and three different pipelines. The OTel project has largely solved this — here's how to deploy it properly.
The OpenTelemetry Operator
The OTel Operator is a Kubernetes controller that manages OpenTelemetryCollector and Instrumentation custom resources. Install it via Helm:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
--namespace opentelemetry-operator-system \
--create-namespace \
--set "manager.collectorImage.repository=otel/opentelemetry-collector-contrib"
Deploying a Collector in Gateway Mode
Run a central collector that receives telemetry from all workloads and fans it out to backends:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: gateway
namespace: monitoring
spec:
mode: Deployment
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
jaeger:
endpoint: jaeger-collector:14250
tls:
insecure: true
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
Auto-Instrumentation: Zero Code Changes
The operator's Instrumentation resource injects OTel SDKs into pods at admission time. For Java, Python, Node.js, and .NET, this requires no code changes:
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: auto-instrumentation
namespace: my-app
spec:
exporter:
endpoint: http://gateway-collector.monitoring:4317
propagators:
- tracecontext
- baggage
sampler:
type: parentbased_traceidratio
argument: "0.1"
java:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
nodejs:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest
Annotate pods to opt in:
annotations:
instrumentation.opentelemetry.io/inject-java: "true"
The operator mutates the pod spec to add an init container that downloads the SDK and sets JAVA_TOOL_OPTIONS — all without touching application manifests beyond the annotation.
Correlating Traces and Logs
The most valuable OTel feature for troubleshooting is trace-log correlation. Ensure your log exporter injects trace_id and span_id into log records, then configure Grafana to link from a Loki log line to the corresponding Jaeger trace. The result: from an error log, one click takes you to the full distributed trace.
Set up the correlation in Grafana's datasource configuration under Derived Fields on the Loki datasource — it's a regex on the log line that extracts the trace ID and links to Jaeger.
What to Instrument First
Don't try to instrument everything at once. Prioritise:
- HTTP/gRPC service boundaries — auto-instrumentation covers these automatically.
- Database calls — most OTel SDKs have database instrumentation plugins.
- Message queue producers/consumers — Kafka, RabbitMQ SDKs available for all major languages.
Leave internal library calls for later. The 20% of instrumentation effort that traces crossing service boundaries gives you 80% of the diagnostic value.