In a previous post I described how we are using Amazon SNS with Symfony Messenger.
But what I didn’t mention is how we handle failures. Sure, we write perfect code with tonnes of testing but failures will
still happen. To ensure that each message is successfully handled we need to implement a retry mechanism for failed messages.
The general idea is to put failed messages on a queue and then republish all the messages from that queue later. Hopefully
we will detect and fix the issue before next time the message is published on SNS.
To achieve this we first need to modify our SnsConsumer to catch all exceptions:
The Rawls\QueueAdapter is just Happyr’s abstraction layer over AMQP. It makes the code look a little cleaner, that is all.
We decorate the message with some metadata to the message like what SnsTopicArn we should republish the message at. Since we have
the Rabbitmq delayed message plugins we also add the
'x-delay header.
We modified our bin/message-consumer slightly to allow the SnsConsumer to see more of the SNS event:
Great, so the failed messages ends up on an amqp exchange called sns_retry and are sent to a queue. Lets create a new
application that reads from that queue and republish messages back on SNS. We will of course use Bref and Lambda for this application
and then configure it to run periodically. This is my src/publish.php:
Quite simple. Just read a message and publish. We could be done here, but these little scripts are missing two features:
Multiple SNS subscribers
Retry back-off
Multiple SNS subscribers
If you have multiple applications subscribed to the same SNS topic and one of them fails. You dont want all applications
to retry the message. Only the failed one.
We can achieve this with adding an application_id to the message. If the application id is empty or matching this application,
then we should process the message.
The publish.php needs to forward the application_id to the SNS message using “MessageAttributes”:
Retry back-off
If a message just failed, it might have been because some small glitch in the network. So we want to retry processing
the message pretty quickly. However, if a message has failed a few times, we maybe want to retry it every third hour or
something similar. Hopfully some developer has deployed a fix so the message will not fail again.
To implement this back-off we use an AMQP topic exchange and different routing keys. We decided that we wanted to use 3 queues:
sns_retry_0 (runs every minute)
sns_retry_1 (runs every 10 minutes)
sns_retry_2 (runs second hour)
The SnsConsumer needs to keep track of how many time we retried a message and choose different topics accordligly.
Now we need to configure the publish.php´ to forward the retry_attempt` header:
Everything works fine. When deploying this we need to make sure we use 3 different Lambda functions to read from the
different queues.
The complete code examples
Bellow are SnsConsumer and publish.php with all features included.
My Bref config looks as follows:
I hope this post was helpful. Please give me your thoughts in the comments below.
The Symfony Runtime component is AWESOME. I did a talk about it at
Symfony World 2021 where I explain how and
why it works. I spent a lot of time on the cont...
The lock component have saved me so many times. It helps me with race conditions,
it makes my code simpler and my application more reliable. I’m using it to ...
I’ve been blogging now and then for over a decade now. I have used Drupal, Wordpress,
plain HTML files, static generators as Jekyll and Sculpin. I’ve also us...
It has been one year since Jérémy Derussé and I started to work on a new API client
for AWS. At the time we could never have imagined how popular it would be...
Leave a Comment