Metrics And Tracing on AWS Lambda in Golang

Introduction

When writing an application, you want to make sure there are metrics and tracing, both of these are extremely fundamental in any debugging process once it is in production.

AWS Lambda is a great problem space to utilize a language such as Golang or Rust as they have small deployment sizes and very low cold starts, however, the tooling to get metrics and tracing from lambda to CloudWatch is confusing to a beginner, this blog post will help you setup tracing and metrics using the most modern and best practice approach which is Open Telemetry + CloudWatch X-Ray + CloudWatch EMF.

Infrastructure (CDK - Typescript)

To connect your lambda to CloudWatch you need to instrument it with a layer and relevant permissions in your infrastructure.

Permissions

You need to create a policy for your Lambda that allows your lambda to post metrics via EMF to a CloudWatch log group.

const partition = Stack.of(this).partition;
const region = Stack.of(this).region;
const account = Stack.of(this).account;
const cwPermission = new PolicyStatement({
  effect: Effect.ALLOW,
  actions: ['logs:PutRetentionPolicy'],
  resources: [
    `arn:${partition}:logs:${region}:${account}:log-group:<LOG_GROUP_NAME>`,
    `arn:${partition}:logs:${region}:${account}:log-group:<LOG_GROUP_NAME>:*`
  ]
});

Open Telemetry Layer

AWS Lambda connects to Open Telemetry via a custom layer called the ADOT Collector Layer (written by AWS). If you need to support another partition such as China or GovCloud you will have to somehow replicate the layer into the partition.

Note: This layer only works in certain regions, check the ADOT Collector Layer docs for the relevant regions

const otelLayer = LayerVersion.fromLayerVersionArn(
  this,
  'LambdaOtelLayer',
  `arn:aws:lambda:${region}:901920570463:layer:aws-otel-collector-<architecture>-ver-0-90-1:1`
);

Open Telemetry Config File

You will need to bundle a collector.yaml file into your uploaded lambda zip. The specification for this file is defined by open telemetry docs but below is the example I used for this blog post.

recievers:
  otlp:
    protocols:
      grpc:
        endpoint: "localhost:4317"
      http:
        endpoint: "localhost:4318"
exporters:
  logging:
  awsxray:
  awsemf:
    log_group_name: "<DESIRED_LOG_GROUP>"
    namespace: "<DESIRED_NAMESPACE>"
    dimension_rollup_option: "NoDimensionRollup"
    log_retention: 60
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [awsxray]
    metrics:
      receivers: [otlp]
      exporters: [awsemf]
    telemetry:
      metrics:
        address: localhost:8888

Lambda

Finally, you need to hook it all together in CDK to a lambda, in this example I will be using a Function construct, but you can use something like the GoFunction construct and it should work the same.

const lambda = new Function(this, 'otelLambda', {
   ...lambdaCommonProps, // code: ... etc
   layers: [otelLayer],
   functionName: 'OtelLambda',
   handler: 'bootstrap',
   environment: {
     OPENTELEMETRY_COLLECTOR_CONFIG_FILE: '/var/task/collector.yaml',
     HANDLER_NAME: 'OtelLambda'
   }
});
lambda.addToRolePolicy(cwPermission);

Service Code

In order to finish instrumentation, you now need to write the actual service level code i.e. the Lambda.

Dependencies

go get go.opentelemetry.io/contrib/instrumentation/github.com/aws/aws-lambda-go/otellambda
go get go.opentelemetry.io/contrib/instrumentation/github.com/aws/aws-lambda-go/otellambda/xrayconfig
go get go.opentelemetry.io/contrib/propagators/aws
go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/exporters/otlp/otlpmetric/oltpmetricgrpc
go get go.opentelemetry.io/otel/sdk
go get go.opentelemetry.io/otel/sdk/metric

Tracing Instrumentation with CloudWatch X-Ray

To begin with before trying to tackle metrics lets simply setup Open Telemetry with X-Ray as this is a fairly simple process.

First, we create a file /internal/otel/metrics.go

func defaultShutdownXray(ctx context.Context) error {
  return errors.New("unconfigured xray")
}
var shutdownXray = defaultShutdownXray

func cleanupXray(tp *sdktrace.TracerProvider) func(ctx context.Context) {
  shutdownXray = tp.Shutdown
  return func (ctx context.Context) {
    err := shutdownXray(ctx)
    if err != nil {
      log.Err(err).Msg("error shutting down tracer provider")
    }
  }
}

func XrayDecorator(lambdaHandler any) any {
  ctx := context.Background()
  tp, err := xrayconfig.NewTracerProvider(ctx)
  if err != nil {
    log.Err(err).Msg("error creating tracer provider")
  }
  cleanup := cleanupXray(tp)
  defer cleanup(ctx)
  otel.SetTracerProvider(tp)
  otel.SetTextMapPropagator(xray.Propagator{})
  return otellambda.InstrumentHandler(lambdaHandler)
}

Finally, we create our basic lambda handler: /cmd/lambda/main.go

func HandleRequest(ctx context.Context, event interface{}) (string, error) {
  return "dummy", nil
}

func main() {
  lambda.Start(internalOtel.XrayDecorator(HandleRequest))
}

Now you can go deploy the infrastructure + service code (cdk build && cdk deploy) and then run the lambda.
If you have setup the code correctly you should now see in CloudWatch X-Ray traces flowing through, congratulations you can stop here if you have no need for custom metrics.

Metrics Instrumentation with CloudWatch EMF

In this section we will setup a basic decorator that will initialize open telemetry in the lambda and will push a custom metric for latency i.e time taken for your function to run. You can extend this to publish any metric you may want.

First edit the /internal/otel/metrics.go file from before.

var int64ObservableGauge func(name string, options ...metric.Int64ObservableGaugeOption) (metric.Int64ObservableGauge, error)

type NewIntMetricParams struct {
  Name string
  Description string
  Unit string
  Value int64
  Attributes []attribute.KeyValue
}

// Unit is ucum standard => https://ucum.org/ucum
func NewIntMetric(params NewIntMetricParams) error {
  log.Info().Interface("params", params).Msg("New Int Metric To Record")
  _, err := int64ObservableGauge(
    params.Name,
    metric.WithDescription(params.Description),
    metric.WithUnit(params.Unit),
    metric.WithInt64Callback(intObserver(params.Value, params.Attributes...)),
  )
  if err != nil {
    return errors.Join(ErrFailedToRecordMetric, err)
  }
  return nil
}

func cleanupMetrics(ctx context.Context, flush func(ctx context.Context) error) {
  log.Info().Msg("flushed metrics")
  err := flush(ctx)
  if err != nil {
    log.Err(err).Msg("failed to shutdown meter provider")
  }
  log.Info().Msg("flushed metrics")
}

func MetricsDecorator[TIn any, TOut any](lambdaHandler LambdaHandler[TIn, TOut]) LambdaHandler[TIn, TOut] {
  res, err := resource.Merge(resource.Default(), resource.NewWithAttributes("https://opentelemetry.io/schemas/1.24.0", semconv.ServiceName("<APPLICATION_NAME>"), semconv.ServiceVersion("<APPLICATION_VERSION>"))
  if err != nil {
    log.Err(err).Msg("failed to initialize metrics")
  }
  metricExporter, err := otlpmetricgrpc.New(ctx, otlpmetricgrpc.WithInsecure(), oltpmetricgrpc.WithEndpoint("0.0.0.0:4317"), oltpmetricgrpc.WithDialOption(grpc.WithBlock()))
  if err != nil {
    log.Err(err).Msg("failed to initialize metrics")
  }
  mp := sdkMetric.NewMeterProvider(
    sdkMetric.WithReader(sdkMetric.NewPeriodicReader(metricExporter)),
    sdkMetric.WithResource(res),
  )
  otel.SetMeterProvider(mp)
  meter = mp.Meter("otellambda")
  int64ObservableGauge = meter.Int64ObservableGauge
  t1 := time.Now()
  resp, err := lambdaHandler(ctx, event)
  t2 := time.Now()
  duration := t2.Sub(t1).Milliseconds()
  NewIntMetric(NewIntMetricParams{
    Name: "duration",
    Description: "duration taken for handler",
    Unit: "ms",
    Value: duration,
    Attributes: []attribute.KeyValue{attribute.String("handler", os.Getenv("HANDLER_NAME"))},
  })
  cleanupMetrics(ctx, mp.ForceFlush)
  log.Info().Msg("cleaned up returning")
  return resp, err
}

Now we update the handler from before /cmd/lambda/main.go

func main() {
  lambda.Start(internalOtel.XrayDecorator(internalOtel.MetricsDecorator[lambdaRequest, string](HandleRequest)))
}

Now if you rerun your lambda, you should be able to look in your log group and see an EMF Metric log emitted, and eventually, a metric will appear into CloudWatch.
Congratulations, if you reached this point, you now have custom metrics and tracing implemented on AWS Lambda in Golang using Open Telemetry

Techytechster's Blog