You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
151 lines
7.4 KiB
151 lines
7.4 KiB
5 years ago
|
+++
|
||
|
title = "Self-Healing Microservices"
|
||
|
date = 2020-01-03
|
||
|
|
||
|
[taxonomies]
|
||
|
tags = ["microservices", "healing", "artifacts"]
|
||
|
+++
|
||
|
|
||
|
All the [previous](@/code/microservices-artifact-input-state.md)
|
||
|
[discussions](@/code/microservices-artifact-ejection.md) I bought about
|
||
|
microservices was just a prelude to something I still don't have a fixed
|
||
|
solution: how do microservices "heal" themselves in case of missing data?
|
||
|
|
||
|
<!-- more -->
|
||
|
|
||
|
Quick recap before jumping into the problem: Microservices produce artifacts;
|
||
|
artifacts are either send downstream through some message broker to other
|
||
|
microservices or kept in the same microservice for future requests;
|
||
|
microservices can listen to more than one data input to build their artifact.
|
||
|
|
||
|
Previously I mentioned an example of a score microservice that produces an
|
||
|
artifact with the current game score for each team and the names of the
|
||
|
players that scored. This microservice could listen to:
|
||
|
|
||
|
1. The teams queue: this may be needed so we can show the team name or its
|
||
|
abbreviation in the score request; once a team appears in the championship,
|
||
|
the microservice adds it to its state for future reference[^1].
|
||
|
2. The players queue: same as above, so the microservice can return the player
|
||
|
name, nickname, shirt number or something player related if the player
|
||
|
scores; again, it keeps listening to the player queue and stores the
|
||
|
players in its state.
|
||
|
3. The match queue: if a match will happen, it has to have a score, probably
|
||
|
starting at 0 with no players in the goal list; this is just to avoid any
|
||
|
issues with the services requesting scores of matches that didn't start
|
||
|
or haven't had any goals yet; in any case, the artifact will be ready to be
|
||
|
retrieved.
|
||
|
4. The narration queue: by listening to the narration queue, the score
|
||
|
microservice will detect any goals, update its state and generate the new
|
||
|
artifact.
|
||
|
|
||
|
The keyword to take from the above list is "could": Depending on the way the
|
||
|
microservice _and_ the messages are built, it may not be necessary to have all
|
||
|
this.
|
||
|
|
||
|
## Using full-blown messages
|
||
|
|
||
|
Let's start with the easiest way to avoid listening to all those queues:
|
||
|
Full-blown messages.
|
||
|
|
||
|
In a full-blown message, all the related data is sent along with the main
|
||
|
information. Using the previous example, the service could listen to just the
|
||
|
match and narration queue, but expect that the "NewMatch" message will contain
|
||
|
the names of the teams, their abbreviation, logo, probably some id and so on;
|
||
|
the same for the "NewNarration" message: it will contain the player name,
|
||
|
nickname, shirt name, player id and so on.
|
||
|
|
||
|
The problem with full-blown messages is that they tend to become bigger and
|
||
|
bigger: As more microservices are plugged in the system, more fields may be
|
||
|
required -- and dropped by services that don't need those fields.
|
||
|
|
||
|
The pro side of full-blown messages is that a microservice will always have
|
||
|
all the information necessary, while keep the number of listening queues low.
|
||
|
This would also help if you just add a new service in the pool: if it starts
|
||
|
with a blank state, it will be able to build all the state from scratch,
|
||
|
'cause all the information is _already there_.
|
||
|
|
||
|
## Listen to base queues, request the rest
|
||
|
|
||
|
Almost like the solution before, the service would listen to the narrations
|
||
|
and matches, but once it detects a missing information (for example, the
|
||
|
narration event says player with ID, but this ID doesn't exist in its state),
|
||
|
the service would request the more "stale" information (players, teams and
|
||
|
products are not added all the time, for example) for some other service and
|
||
|
fill the lacking information in its state.
|
||
|
|
||
|
This means that this microservice now, instead of knowing only about queues,
|
||
|
now has to have information about other services (the ones that process and
|
||
|
store the "stale" data) and their interfaces -- and, in general, it would also
|
||
|
require some service discovery in the system. Those microservices would be
|
||
|
the "two faced" type of microservice, which receives information, store in the
|
||
|
state, build the artifact but also has an interface for it to be retrieved
|
||
|
instead of simply receiving, processing and passing it along. Caching would
|
||
|
also be advised here, so one service can't flood the other requesting the same
|
||
|
data over and over -- and updates from time to time would make sense in some
|
||
|
situations.
|
||
|
|
||
|
The messages are shorter now ('cause you receive only the team/player ID
|
||
|
instead of everything) and retrieval of information happen when necessary, but
|
||
|
where you reduce the number of listeners, you increase the number of requests.
|
||
|
As will full-blown messages, a new service can easily build its own state from
|
||
|
scratch without any issues -- it will do a lot of requests, but it will,
|
||
|
eventually, have all the necessary information.
|
||
|
|
||
|
## Listen to all
|
||
|
|
||
|
This is exactly same solution as presented in the example above: the
|
||
|
microservice keeps listening to the queues of all related events and build the
|
||
|
state with them.
|
||
|
|
||
|
One problem with this solution: since the queues are asynchronous,
|
||
|
there could be a problem with the ordering of the data, with goals coming
|
||
|
before players (for different reasons). In this case... what would the service
|
||
|
do? Reject the goal in the hopes the player will appear, to avoid any
|
||
|
inconsistencies, and that the message broker requeue the event?
|
||
|
|
||
|
One solution would have services that, along with this one, listen to one
|
||
|
specific data: the score microservice listens to all four, but one
|
||
|
microservice listens only to the player queue. This service would
|
||
|
process things way faster than the score, and serve as sort of "fallback" in
|
||
|
case some data is missing, kinda like the solution above. This will reduce the
|
||
|
traffic in the network, but it'd create have duplicate data in different
|
||
|
services -- although that last point shouldn't be a problem in the first
|
||
|
place.
|
||
|
|
||
|
New services will find it problematic, 'cause although they are receiving the
|
||
|
main data, they are were not alive when the more "stale" data was processed;
|
||
|
they will need to either communicate with other services to get this
|
||
|
information, or someone
|
||
|
will have to manually duplicate the sources.
|
||
|
|
||
|
## Single queue
|
||
|
|
||
|
I just describe solutions in which every data has its own queue, but what if
|
||
|
we put _all_ the events in the same queue? This way, order is assured (players
|
||
|
will be queue before the goals, and the services will process players before
|
||
|
they even see there is a goal).
|
||
|
|
||
|
This reduces the number of listeners, but it requires some good
|
||
|
message design, specially in statically typed languages, which usually require
|
||
|
a well-defined structure for serialization and deseralization.
|
||
|
|
||
|
But it solves almost everything else: there is no issue with the processing
|
||
|
order, the number of listeners is low and the messages are small. But it will
|
||
|
also make new services suffer from the lack of stale data, forcing them to
|
||
|
communicate with other services or to have the data manually copied when they
|
||
|
are brought up.
|
||
|
|
||
|
# The best one?
|
||
|
|
||
|
Honestly, I have no idea. I'm siding with "Full-blown messages" simply 'cause
|
||
|
it simplifies the structure of the services, even knowing that network is not
|
||
|
free; if I used some non-statically typed language, I'd probably side with the
|
||
|
single queue one. But, again, I don't think there is any "one size fits all".
|
||
|
|
||
|
Probably there are more architectural options for this, and those are the ones
|
||
|
I can remember discussing with my coworkers.
|
||
|
|
||
|
[^1]: It's worth noting that the microservice may simply drop some of the
|
||
|
information. For example, if the artifact produced only requires the
|
||
|
abbreviated name, it full name may be completely dropped from the state.
|