CategoriesObservability

Open Telemetry and Lambda

Open Telemetry and Lambda are topics that I’ve spent a lot of brain cycles on over the past few weeks. Most of my work with these topics has been with Rust but I want to take a look at where my opinions are at the moment and hope to provide some insights that you can use to build your next Lambda with solid telemetry. For disclosure, I have some opinionated biases that will surface and I also still have some gaps in my exploration that I’ll highlight. Let’s get going with Open Telemetry and Lambda.

CategoriesObservability

Analyzing and Correcting Errors with Advanced SQS Redrive

A good friend of mine is working on a really neat redrive tool with SQS and wanted to write an article to describe its purpose and use. I’m super honored that he asked me to share his writing on my blog. Please find below Adam Tran’s “Analyzing and Correcting Errors with Advanced SQS Redrive”

Analyzing and Correcting Errors with Advanced SQS Redrive

Analyzing dead-letter queues (DLQs) within the AWS ecosystem can be tricky. Receiving and analyzing messages via the AWS Console is very limited, and does not allow for the manipulation of messages in any sensible manner. Sure, you can redrive an entire DLQ, but what if you need to analyze thousands of messages or make changes?

There are many potential solutions to this problem, but a simple solution that I’ve developed is to download your queues’ messages locally where they can be analyzed with any tool of your choosing. I’ve defined a stateful directory structure to reflect where a message is in its journey of analysis so that you can make changes in whatever manner you deem appropriate.

CategoriesInfrastructureObservability

Monitoring SQS with Datadog

Event-Driven architecture paired with Serverless technologies are a powerful combo to build applications. But failure does happen and you should expect it to happen. Dealing with that failure is often done by dead-lettering messages into a Dead-Letter-Queue. But what do you do in order to monitor those queues? Most people start manually checking them or perhaps adding a CloudWatch Alarm that triggers an SNS topic. What I’d like to show you is a more advanced version of this monitoring through some code, constructs and AWS CodeSuite of tools. Say hello to monitoring SQS with Datadog.

CategoriesServerlessObservability

SQS Re-Drive with Golang and Step Functions

Earlier this week a new set of APIs were released for working with Dead-Letter-Queues and re-drives back to its primary queue. Messaging-based systems have been around for a long time and they are a critical piece of modern Event-Driven Architecture. As I read more about the APIs, I started thinking about how I could build up a sample that could be used for starting a hardened auto-re-drive State Machine that could put messages back on queues protected behind an API Gateway or Event Bridge Scheduler. Below is my take on how I might start thinking through building an SQS re-drive with Golang and Step Functions

CategoriesInfrastructureObservabilityProgramming

Infrastructure as Code

Infrastructure as Code is an emerging practice that encourages the writing of cloud infrastructure as code instead of clicking your way to deployment. I feel like “ClickOps” is where we all started years ago when there weren’t any other options. The lessons learned from the inconsistency in human deployment were the genesis for the automation and power that comes from building your cloud stacks as code. Now, many start from IaC as the patterns and practices are well-defined. But instead of re-hashing those commentaries, I want to give you my opinions on why IaC decisions are more than about the tech. Infrastructure as Code is a shift of responsibilities that brings your teams closer together and will help establish a culture of accountability but it will come at a cost.

CategoriesObservability

Tracing HTTP Requests with Go and Datadog

Small follow up on the last post regarding tracing. I’m a huge fan of Event Driven systems or EDA (Event Driven Architecture) but sometimes you do need to make that synchronous HTTP request in order to fetch more data. Perhaps you are building a “saga” or sometimes events just published what happened and to whom it happened but not specifics about the actual event. For that you need to return back out and fetch more info.

When that happens, you’ll need to use a HTTP Client for making that request. And when doing so, it often sort of turns into a black hole, especially if you have multiple calls to make and you need to distinguish them. Enter again the Datadog libraries. With a simple wrapping of the client, when you make requests WithContext you will get a nicer and prettier display of what the span is. In the case below, I usually like to set the VERB that was requested in addition to the URL. Feel free to use/show whatever makes sense to you